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Digital encoding of speech to allow more efficient 
transmission at low data rates involves the decomposition 
of the speech waveform into various parameters which are 
related to the physical structure of the speech production 
process. In this thesis, linear predictive coding is used 
to produce a set of coefficients for the character i st i c 
polynomial of sucessive 25 msec, segments of the voice 
track, in the z-domain. The location of the poles in the 
z-plane and the excitation pitch period are then shifted 
and the signal reformulated to cause changes of the overall 
frequency characteristics of the speech waveform, v/hile 
maintaining the perceived sounds and information content. 
The resulting audio tapes confirm the theory and 
conjectures of the thesis. 
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I. INTRODUCTION 



Digital processing of speech signals has become 

important and necessary with the introduction of high-speed 

* 

digital devices into every phase of communication: place to 
place; man to machine; and machine to man. 

Digital signals have a number of inherent advantages 
over analog signals. Digital signals may be coded for 
security or for noise immunity. A digital voice signal may 
be transmitted by the same equipment used for data and it 
may be multiplexed with that data. One of the primary 
disadvantages of the digital transmission of voice is the 
large bandwidth required with some digital techniques. 

When analog techniques, such as single side-band amplitude 
modulation, produce bandwidths of 5KHz and the best digital 
system bandwidth was 64khz, there was a very strong 
tendency to stay v/i th the analog techniques. 

However, recent advances in digital signal processing 
have made the digital transmission of voice highly 
efficient. Until recently digital transmission of soeech 
was possible only by sampling the voice waveform at a 
sufficiently high rate and then performing an 
ana 1 og- to-d i g i ta 1 conversion of each sample. A sufficient 
number of bits were transmitted for each sample which was 
sent to reconstruct the waveform at the reciever. The 
voice waveform must be sampled at aproximately 8,000 
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samples per second to avoid the loss of clarity. Each of 
the samples must then be converted to a 6-10 bit number for 
transmission. The overall data rate using these methods had 
a lower limit in the neighborhood of 48,000 bits per 
second . 

Recent developments have allowed the voice pattern to 
be broken down into more basic parameters which are closely 
associated with the physical production of speech. These 
parameters vary rather slowly and can be transmitted at a 
lower rate. Data rates as low as 1200 bits per second have 
been achieved through the use of these techniques. 

These methods are numerical representat i ons of the 
physical production of speech, and therefore it is easier 
to alter the cha rac te r i s t i cs of speech by altering the 
associated parameters then by trying to alter the waveform 
directly. 

This thesis reviews various digital speech processing 
techniques for use in a speech modification system. Linear 
predictive coding (LPC) w as chosen for implementation and 
therefore the theory and practice of this techniaue are 
explained in detail. The desired modification of the 
speech waveform by shifting the poles of its cha rac te r i s t i c 
polynomial, and the regeneration of the altered waveform 
are discussed and the implementation techniques explained. 
The IBM 360 computer was used for simulating the techniaues 
developed. This simulation is covered in detail and the 
computer programs, with results, are provided. 
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M. SPEECH PRODUCTION AND CHARACTERISTICS 



Any digital system for altering speech characteristics 
must be based on knowledge of those cha rac te r i s t i cs and the 
physical structure which determines them. 

A. SPEECH CHARACTERISTICS 

All speech can be broken down into a set of distinctive 
sounds called phonemes. In the case of American English, 
there are generally considered to be 42 distinct phonemes 
which are classified into vowels, diphthongs, semivowels 
and consonants. Spoken communication is accomplished 
through various combinations of these sounds and the 
accurate reproduction of each is a major criteria in 
judging voice processing systems. Phonemes are generated 
at a rate of about ten per second. Each phoneme is 
classified as voiced if vocal cord vibration is the source 
of the sound or unvoiced if the sound is produced by other 
means. If the cha rac ter i s t i cs of a phoneme change from the 
start to finish, the phoneme is called noncontinuant. Those 
phonemes which are stationary are called continuant. 

The lowest frequency present in a given voiced sound is 
called the pitch frequency. There are peaks in the spectral 
rep resentat ion of a speech sound that are above the pitch 
frequency which are called formants and are numbered 
consecutively with increasing frequency. Although two 



10 



speakers may produce the same phoneme / the pitch and 
formant frequencies may be different. However, general 
re 1 a t i on sh i ps may be established between pitch and formant 
frequencies wh i ch are relatively constant from speaker to 
speaker, producing the same phoneme. If information is to 
be retained by a speech processing system, it must be able 
to reproduce at output, the pitch and formant frequency 
relationship which was present at the input. 

B. PHYSICAL SPEECH PRODUCTION STRUCTURE 

The vocal tract is a resonant tube v/i th the vocal cords 
at one end and the lips at the other. The vocal tract acts 
as a frequency selective filter which has a transfer 
function that depends on how it is shaped at any given 
time. 




(A) VOICED (B) UNVOICED 

FIGURE 1. SOUND PRODUCTION 

The input to the vocal tract is caused by either the 
vibration of the vocal cords at the lower end (figure l.a) 
or by the turbulence of air being forced through a 
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constriction at any of a number of locations along the 
vocal tract (figure l.b). The vocal tract acts as a filter 
with a pulsed input from the vocal cords when producing 
voiced sounds such as 'a' or 'o'. During sounds caused by 
the forcing of air through a constr i ct ion, fricative sounds 
like 's' or 'f', the vocal tract acts as a resonant cavity 
which will have certain cha rac te r i s t i c response 
frequencies. Typical waveforms for voiced and unvoiced 
sounds are shown in figure 2. 





UN VO ICE D 

FIGURE 2. TYPICAL WAVEFORMS 



Certain cha racter i s t i cs of the vocal tract are changed 
several times per second to produce different sounds while 
others such as overall length and the diameter range limits 
are fixed for a given speaker. A detailed look at each of 
the types of sounds will insure that the digital processor 
used has the same flexibility as the actual speaker. 

Vowels, voiced continuant sounds, are produced when 
the vocal cords vibrate causing pulses of air at the bottom 
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of the vocal tract. The shape of the vocal tract remains 
fixed during vowel production, acting as a stationary 
filter to respond to the forcing function. 

The production of diphthongs and semivowels is similar 
to that of vowels except that the shape of the vocal tract 
is smoothly changed during voicing. Diphthongs and 
semivowels are noncontinuant, voiced sounds. 

The phonemes classified as consonants may actually be 
further divided into subca tago r i es of voiced fricatives, 
unvoiced fricatives, stops and nasals. Fricatives are 
caused by the steady flow of air through a constriction in 
the vocal tract which causes turbulant air motion and a 
seemingly random air pressure pattern. Fricatives are 
voiced or unvoiced depending on whether the vocal cords are 
producing pressure pulses at the same time. Stops or 
plosives are caused by completely closing the vocal tract 
and then suddenly opening it to quickly start sound 
production. A stop is classified as voiced or unvoiced 
depending on the nature of the sound that follows the 
opening of the vocal tract. Nasals are voiced sounds which 
are formed when the vocal tract is closed and air is 
allowed to pass through the nasal cavity. This acts as a 
feed forward path for the sound and a corresponding change 
is caused in the total vocal tract response. 

C. INFORMATION CONTENT 

One of the primary goals of speech processing is the 
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development of efficient codes for transmitting or storing 
speech and still allowing it to be recons t rue ted without 
excessive loss of information. The source coding theorem 
states that through the proper choice of coding we can code 
a source into a bit sequence arbitrarily close in length to 
the entropy of that source. However, efficient codes are 
difficult to find for even simple binary sources, let alone 
a continuous speech source. An estimation of the entropy of 
a typical speech source provides a useful guage for 
measuring the data rate performance of any system. 

If 'excessive loss of information' occurs only when we 
don't receive the correct one of the 42 phonemes, the 
information content of one second of speech is 
approximately (assuming 10 phonemes are produced per 
second ) : 

42 

H ■ 10 Z P(p.) (-log P( p . ) ) 
i =1 

where P(p.) is the probability of the i th phoneme. Assuming 
further that each phoneme is equally likely, 

H = 10 x 42 x 1/42 x log 42 = 54 bits per second 

If the actual probability of each phoneme was used, i.e. 
they are not equally likely, the value of entropy would be 
significantly lower. 

If 'excessive loss of information' also includes 



failure to identify the speaker and failure to indicate the 
speaker's emotional state the information content is 
higher. However if we assume that identification of the 
speaker (one of about two billion) is only reauired once 
per minute and that the speaker's emotional state (say one 
of ten) can only change once per second the entropy is 
still only 58 bits. 

H(speaker) = 1/60 x 10 x 1/10 x (~log(l/10 )) = 0.5 

H(emotion) = 10 x 1/10 x (-log (1/10)) = 3.3 

H(phoneme) = 54 bits per second 

H(total) = 58 bits per second 

Clearly the theoretical limit is not being pushed by the 
current state of the art in speech coding. 
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III. DIGITAL SPEECH PROCESSING TECHNIQUES 



Digital speech processing techniques may be placed into 
three general categories based on the assumptions used in 
their development. The first category is that of waveform 
techniques where the only primary assumption is that the 
signal wh i ch is being processed is frequency limited to no 
more than half of the sampling freauency. The second 
category of spectral methods adds the assumption that the 
frequency domain character i st ics of the speec' n waveform 
vary slowly. Finally, the voice tract parameter techniaues 
assume that the physical voice production system can be 
mode led d i g i ta 1 1 y . 

A. WAVEFORM METHODS 

'Waveform techriques have the charscteri st ic of 
operating equally well on any low-pass filtered waveform 
and all are generally based on the familar pulse code 
modulation. The basic requirements of a waveform 
quantization method is that the waveform be sampled at 
greater than twice the highest frequency present and that 
the samples be quantized into a digital code for 
transmission. Although this technique is very straight 
forward, it also requires a high data rate. A waveform 
sampled 9600 times per second with each sample quantized to 
256 levels would require 76,300 bits per second for 
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transm i ss ion . A number of variations (differential 
modulation and adaptive differential modulation) have been 
used to reduce the required data rate but have failed to 
cut the required data rate by more than about half. 

B. SPECTRAL TECHNIQUES 

1 • Short Term Frequency Analysis 

These methods deal with the short-term freauency 
properties of the speech signal. An early spectral method 
was the channel vocoder. The transmitting processor of the 
channel vocoder consists of a bank of narrow-band analog 
filters. The energy passed by each filter is measured and 
transmitted to the receiver site. It is also determined 
whether the input speech wa s voiced or unvoiced and that 
determination is transmitted. In the receiver an 
excitation signal/ determined by the voicing decision/ was 
fed into a bank of narrow-band filters, each of which had 
an adjustable gain determined by the received energy 
measu remen ts . 

The same technique can be implemented in an all 
digital method by replacing the bank of analog filters with 
digital filters or by performing a discrete Fourier 
t ransf o rma t i on (DFT) on a frame of input samples. The use 
of the DFT is usually preferred because of computational 
efficiency and the availability of high-speed DFT array 
processors. Normally each input frame is windowed to 
reduce the noise which can be caused by a sharp cut off at 
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the end of a frame. When this method is used to reduce the 
data rate required for digital transmission, the total DFT 
of each frame is not transmitted because the total DFT 
would require the same number of bits as the frame of 
samples (assuming both are quantized to the same number of 
levels). Reduction in the data rate can be accomplished by 
skippirg frames and assuming they are duplicates of the 
preceeding frame during recons t ruct i on . The number of 
samples in the frame is also half the number of frequencies 
resolved by the DFT, therefore the frame length for 
analysis is choosen as a compromize between accuracy of 
voice reproduction and the desire for a 1 ow data rate. 

This method of speech processing wculd lend itself 
well to altering the frequency characteristics of voice 
signals but it reaui res a relatively high data transmission 
rate and therefore was not desirable for speech orocessing 
in conjunction with place to place communications or with 
digitally stored speech. 

2 . Homomorphic Processing 

Another method which involves freauency domain 
processing is homomorphic processing. It is based bn the 
following three principles: 

(1) Speech is the convolution of an excitation 
function and the transfer function of the vocal 
tract . 

(2) Convolution in the time domain is equivalent to 
multiplication in the frequency domain. 

(3) The Fourier transform is a linear 
transformat ion, i.e. 
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F ( x ( t ) +y ( t ) ) = F ( x ( t ) ) + F ( y ( t ) ) = X(w) + Y(w) 

A method of separating a speech waveform back into these 
components would help us analyze the speech. Homomorphic 
processing centers around the efficient deconvolution of 
these s i gna 1 s . 

First the input signal is windowed and transformed 
via the DFT, to produce the frequency domain representat i on 
of the input speech. The time convolution of two signals is 
equivalent to multiplication in the frequency domain. 
However knowing the product of two waveforms does little 
toward gaining knowledge of the multiplicands unless 
further information is given. The multiplication of the 
two values at a given frequency is eauivalent to adding the 
logarithms of each. The log is taken of each of the values 
in the frequency domain representation of the signal which 
is then equal to the sum of the the log of the frequency 
domain rep resen tat i on of the excitation function plus the 
the log of the frequency domain representat ion of the vocal 
tract function. However, it is easier to tell the 
difference between the vocal tract excitation functions in 
the time domain, so the inverse DFT is taken , of the log of 
the frequency domain function. The function produced is 
called the cepstrum of the signal. Because taking the 
inverse DFT is a linear function, and the frequency domain 
function was the sum of two component functions, the time 
domain cepstrum must also be the sum of the cepstrum of the 
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FIGURE 3. HOMOMORPHIC DECONVOLUTION 



excitation function and the cepstrum of the vocal tract 
function. Figure 3 illustrates the relationship between 
the steps of homomorphic deconvolution of signals. 

Examination of the cepstrum between 2.5 and 20 
msec, may reveal a peak that is considerably above the 
background noise level. If a peak is there, the segment is 
determined to be voiced with the peak occuring at the pitch 
period. The vocal tract is not long enough to sustain any 
vibrations for more than 20 msec, after a pulsed input. 

If there is no peak the segment is considered unvoiced. 

The cepstrum of the excitation function may be subtracted 
from the total cepstrum and the remainder considered an 
estimate of the cepstrum of the vocal tract transfer 
function. After working backwards to magnitude (vs. log of 
magnitude) in the frequency domain, the filter coefficients 
may be determined. 

It would be relatively straight forward to alter 
both the excitation function and the vocal tract transfer 
function after the total cepstrum is broken into its 
additive components. However, homomorphic processing was 
not being widely used for voice communication and this 
technique was dropped in favor of a more widely used 
system. As array fast Fourier transform processors become 
faster and less expensive, homomorphic speech processing 
may become the dominant speech communication technique. 
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C. VOICE TRACT PARAMETER TECHNIQUES IN THE TIME DOMAIN 
The primary cha racter i s t i c of this catagory is the 
close tie between the digital process and the physical 
structure being modeled. Although homomorphic processing 
uses the deconvolution of the vocal tract function and the 
excitation function as a primary tool, the homomorphic 
process does require transformations to and from the 
frequency domain and therefore is not included in this 
catagory. The primary member of this catagory is the linear 
prediction coding (LPC) process which has shown itself to 
be among the best and most versitile of the various speech 
processing techniques. 

1 . The Speech Model 

The speech model assumed and used for LPC is that 
of a tine-varying digital filter which is excited by a 
wide-band function, either a pulsed input or random noise. 
This is illustrated in figure 4. The recursive filter used 
to model the vocal tract is all-pole and has slowly time 
varying (pseudo-stationary) coefficients. The filter's 
z-domain transfer function is 

1 

P -i 

1 - 

i =1 
o r 




Y ( z ) = 

U ( z ) 
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Y(z) 



p i 

= U ( z ) + ( ^ a . z ) Y ( z ) 
i =1 

or in the discrete time-domain 

P 

Y(nT) = U(nT) + ^ a. Y( (n-i )T) 

i =1 

From the time domain equation it is clear that the current 
output Y(nT) is uniaueiy specified in terms of the current 
input and the past p output values. 




PHYSICAL MODEL 
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The vocal tract is not always best modeled by an all-pole 
filter, and particularly nasal sounds would probably be 
best modeled by a filter which also included zeros. Hov/ever 
there is considerable difficulty in rapidly estimating both 
coles and zeros of a transfer function when only a short 
segment of the output is available for analysis. However, 
experience has shown that high quality voice production is 
possible by using an all-pole filter of adequate order. 

The order of the filter required is closely related 
to the length of the vocal tract. To adequately represent 
the lower frequency response of the vocal tract, the filter 
must include recursive delay equal to the delay encountered 
by sound waves traveling from the vocal cords to the lips 
and returning to the glottis. 

velocity of sound = 344 m/sec 
length of vocal tract = 17 cm 

2 x 0.17 = 0.988 msec 
344 

At a sampling rate of 10kHz at least 10 past values would 
need to be included for an accurate model. 

The excitation function for voiced sounds in 
modeled by a train of pulses at the glottis. Clearly these 
pulses can not be a perfect set of impulses, but rather 
must have a finite width and are likely to have a definite 
shape. Rather than construct a separate filter to change 
the impulses into the correct shape, additional poles are 
added to the model so that the combined transfer function 
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may be calculated at once. Normally two additions poles are 
adequate for the pulse shape model. 

2 . Linear Predictive Techniques 

Linear predictive analysis is based on the division 
of speech mod e 1 i ng into mode ling of the excitation function 
and modeling of the vocal tract transfer function. The 
vocal tract is modeled by computing each sample as a 
we ighted linear combination of previous samples. Linear 
predictive coding of speech is accomplished by filtering a 
sampled speech waveform through a filter which is the 
inverse of the filter which models the vocal tract. If the 
filter used is the inverse of a good model of the vocal 
tract, the output will be a good approximation of the 
excitation function. The various properties of the 
excitation function, along with the coefficients used in 
the vocal tract filter are measured and transmi tted as 
shown in figure 5 . 



\ 
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FIGURE 5. ENCODING PROCESS 




The received measurements are used in the decoding 
processor to reconstruct the excitation function and the 
filter. The process of reconstructing the soeech waveform 
is shown in figure 6. 
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FIGURE 6. DECODING PROCESS 

The primary advantage in the use of linear 
predictive coding of speech is the reduction in the data 
rate required for transmission or storage. LPC systems have 
been developed which require data rates from 3000 to 4800 
bits per second for high quality voice communication and 
rates as low as 1200 bits per second have been reported for 
lower quality but understandable speech production. Highly 
efficient algorithms have been developed for the encoding 
and decoding of speech using the LPC technique. When 
hardware implemented with special purpose, short word 
length mi crop rocesso rs , the computations reauired for 
two-way communication have been done in 65% of real time. 

LPC was chosen as the method to be used for 
accomplishing the desired voice cha racte r i s t i c 
modifications. A detailed description of the theory and 
modeling assumptions follows. 
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IV. LINEAR PREDICTION THEORY 



Linear prediction is an extension of least squares 
estimation. In the case of one-dimensional linear 
prediction, it is more commonly labeled as time series 
analysis when used by statisticians for analysis of 
everything from population to the stock market. 

A. THEORY 

It is assumed that each sample of the discrete time 
series, s(kT), as shown in figure 7 may be approximated by 
a linear combination of past samples of the time series. 

m 

s(kT) s(Ck-i ) T) 

i =1 

where s(kT) is the estimated sample value, a. is the 

I 

coefficient of the sample i steps past and m is the order 
of the approximation (and as we will see later the order of 
the z-domain filter of the model). 




27 



For a portion of the discrete time series (N samples where 
N>m), a least squares approximation of the weighting 
coefficients, a., may be calculated. The estimate at each 
po i nt 



s(kT) 




s ( ( k- i )T) 

1 < k < m 



is subtracted from the actual sample value and the error 

for each estimate, e(kT) is given. 

e(kT) = s(kT) - s(kT) 

1 < k < m 



e(kT) 



s(kT) 




s ( ( k-i )T) 



i =1 

1 < k < m 



To minimize the error (in a least squares sense) the error 
is squared and summed over all points in the region of 
interest to obtain an overall error, E. 



■E 



k=l 



m 



CkT) = 



sCkT) 



k=l 



-Th 

i =1 



1 



s ( ( k-i ) T) 



The derivative of E with respect to each of 
coefficients, a., is taken and set equal to 
to locate the minimum of E. This yields the 
equati ons . 



the 

zero i n order 
following m 
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N 



m 



■I 



m 



= 0 = 

da 

J k=l 



however 



2 (s ( kT) a. s ( ( k - 1 ) T )! d_ (s ( kT) a.s((k-i)T) 

- ' da. 1 L* ' 



i =1 



i =1 

1 i j i m 



d_ fs ( kT) 

da. L 

j 



= 0 



and 



a r 

/Qa L 



a. sC(k-i ) T) 



da 



J - 0 , i * j 

= s((k-j)T), i = j 



therefore 



26 = 0 = E 



m 



s ( kT) - \ a. s( (k-i )T) 



k = l ^ i =1 



(-1) s ( ( k-j ) T) 



11 j < m 

removing the constant multiplier 
N N m 

0- V SCkT)s(Ck-j)T) - ^ ^ a.s((k— i)T)s(C k-j )T) 

k=l i =1 



k=l 



changing the order of summation 



1 1 j (m 




s(kT)s( (k-j )T) 



m N 




s ( ( k- i )T)s( (k-j)T) 



k = l i =1 k = l 

1 1 j 1 m 

Given all of the samples within the summations over N, 

the above set of m equations in the m unknowns, a., can be 

\ 

solved. If only the samples 
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s ( kT ) 1 <. k <. N 

are given, the set of equations above can not be solved 
because of the requi rement to know the samples 

s(( 1-j )T) 1 ( j (m 

However by windowing the samples so that all samples 
outside the region of interest are zero 

s(kT) =0 k <. D and k > N 

the summations over N in the set of equations above may be 
replaced by the au tocor r el a t i on of the windowed samples, 
s' (kT). 

N-j 

R ( j ) = ^ s'(kT)s'((k+j)T) 
k = l 

0 < j < m 

This assumption may be made because the number of samples, 
N, is normally much greater than the order, m, of the set 
of equations. Therefore relatively few samples are lost. 
The window function used will not significantly alter the 
samples in the center of the frame, and therefore the 
resulting coefficients will be a correct approximation for 
that segment. The set of linear equations may now be 
wr i tten 



m 

R( j ) = ^a. R ( i -j ) 
i =1 

1 <. j <. m 

These equations may now be solved for the linear predictive 
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coefficients, a., 1 <_ i <. m. 

If the system being studied is stationary or we are 
only considering a pseudo-stationary segment of the system 
output, and if the order of the model is sufficiently close 
to the order of the real system, future values of the 
variable may be calculated recursively from previous 
values. In the following section we will see how this 
theory is applied to speech modeling and reconstruct! on. 



B. LINEAR PREDICTIVE CODING FOR VOICE ANALYSIS 

The digital model used for speech synthesis is shown in 
figure 8. The discrete time excitation function is e(nT) 
and the synthesized speech output is s(nT). 



A A A A 



V 






uv 



VOCAL 

TRACT 

FILTER 



s (nT) , 



FIGURE 8. SPEECH SYNTHESIS MODEL 



The vocal tract filter is assumed to be all-pole and 
therefore can be represented by the z-domain equation 



H(z) = S(z) = 

E ( z ) m 

TT ( z-p. ) 
i =1 



Multiplying out the denominator and dividing both numerator 
and denominator by z m yields. 
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H(z) = S ( z ) 



1 



m 

E(z) 1- V a. z 
i=l ' 



This z-domain equation is converted to a discrete time 
domain equation as follows 

m 



S ( z ) ( 1 






= 1 



E(z) 



m 

S(z) = E(z) a. z 1 S(z) 

i =1 



s(nT) 



m 



e( nT) 




a. s( ( n-i )T) 



i =1 



If the excitation function e(nT) equals zero for a given 
sample, then this equation is similar to the first equation 
in the previous section on the theory of linear prediction. 
The coefficients of the z-domain filter transfer function 
are equivalent to the linear prediction wieghting 
coeff i c i ents . 

Analysis of the sampled speech waveform is used to 
calculate the prediction coefficients which are then used 
in an inverse filter to determine the excitation function 
from the input speech. This inverse filter may be 
represented as 
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or as 



ELzl = 
S( z) 



1 



m 




m 

E(nT) = S(nT) a. s((n-i)T) 

i =1 

and is construted as shown in figure 9. 




The input speech has been broken into vocal tract 
character i s t i cs determined by the prediction coefficients 
and excitation signal character i st i cs which remain to be 
determined. During the encoding process the output of the 
inverse filter may also be considered an error signal 
because it is the difference between the actual speech 
sample and the predicted speech sample. 

During voiced speech the vocal tract filter in figure 9 
acts as a model for the total transfer function which is 
due to the glottal pulse shape, the actual vocal tract 
shape and the output reflection at the lips. Idealy during 
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voiced speech all of these effects are removed by the 
inverse filter and the error function is a train of 
impulses at the pitch frequency. 

During unvoiced speech the physical excitation function 
is a pseudo-random air pressure variation caused by 
turbulence at a constriction somewhere along the vocal 
tract. This wi de-band source is filtered by the portion of 
the vocal tract between the constriction and the lips. This 
portion of the vocal tract will resonate at certian 
characteristic frequencies but normally the number of peaks 
in the frequency domain response will be fewer than for 
voiced sounds because of the shorter segment of the vocal 
tract in use. During encoding of unvoiced speech the output 
of the inverse filter is pseudo- random because the inverse 
filter can't predict the output due to the random input. 

The speech model is not complete with just the 
determination of the coefficients of the vocal tract 
filter. During speech reconstruction it is necessary to 
know: 

(1) Which excitation signal, pulses or noise, to 

use. 

(2) Excitation pulse period for voiced sounds. 

(3) The gain multiplication factor. 

Although these quantities are not necessarily determined 
using linear prediction theory, they are none the less 
required for a working speech encoding/decoding system. 

During encoding, the marked difference in the error 
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signal for voiced and unvoiced speech can be used as the 
basis for the voiced/unvoiced decision. The energy of the 
error signal for voiced speech should be rather small in 
comparison to the energy of the input samples. On the other 
hand, during unvoiced speech the prediction is poor and 
most of the energy remains after filtering. The ratio of 
the average energy or root-mean-square value of the speech 
samples to the similar quantity of the error signal can be 
used to make the vo i ced/ unvo i ced decission. This ratio is 
compared to an empirically determined threshold and the 
segment is considered voiced whenever the ratio is greater 
than the threshold. 

The gain used during reconstruction is the amplitude 
multiplier of the excitation signal at the input of the 
vocal tract filter. The gain used during unvoiced speech 
may be simply the root-mean-square of the error signal. 

This gain coefficient is multiplied by the output of a 
random number generator which produces normally distributed 
numbers with a root-mean-square value of unity. 

The gain of voiced speech may also be determined from 
the root-mean-square value of the error signal. However 
during reconstruction of voiced speech the entire energy of 
the excitation signal is concentrated in a series of 
impulses which should have the same root-mean-square value. 
The root-mean-square value of a series of d i scr ete- t i me 
impulses with amplitude, a, and a period, p, intervals is 
approximated by 
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N 



1 

rms = N 




2 1/2 



i =1 



rms = I N 2 1/2 

N P a 



N> > p 



- 1/2 

rms = a p 



The output of a unit impulse generator should then be 
mu 1 1 i pi i ed by 



to insure that the same energy is input to the vocal tract 
filter as was output by the filter during encoding. The 
above method for calculating the gain needed during 
reconstruction is based on the assumption that the 
prediction error for voiced speech is caused entirely by 
the physical excitation function of the speaker. However 
the prediction error may be increased because the vocal 
tract was changing shape rapidly during the analysis frame 
or because of background noise at the microphone which 
would not be removed by the inverse filter. Either of these 
would cause an unwanted gain increase during 
reconstruction. A typical voiced speech waveform and the 
error signal generated from it are shown in figure 10. 



1/2 



G = rms p 
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(A) VOICED SPEECH 
WAVEFORM 







(B) ERROR SIGNAL 
WAVEFORM 




FIGURE 10 . 



The reliable determination of the pitch period of 
voiced speech is a problem for which the ideal solution is 
still undetermined. The periodic increase in the amplitude 
of the error signal at the pitch period is shown in figure 
10(b) and suggests the use of the error signal in pitch 
period determination. A number of algorithms exist for 
determination of the pitch period which generally involve 
various combinations cf the following processes. 

(1) Raising the error signal to a given power. 

(2) Low-pass filtering of the error signal. 

(3) Windowing the error signal. 

(4) Calculating the autocor rel at i on function of the 

filtered error signal. 

(5) Picking the peaks of the au tocor rel at i cn 

f unct i on . 

Experience has shown that pitch determination is 
computationally as difficult as the LPC parameter 



37 



determination and the literature on the subject illustrates 
the trade-off between hardware, software, computation time 
and reliability from method to method. 

C. L PC COMMUNICATION SYSTEMS 

A review of existing LPC communication hardware is 
useful because any method which alters formant and pitch 
characteristics of speech will be most successful if it is 
compatable with these systems. 

Currently off-the-shelf mi croprocessors are not fast 
enough to handle the algorithms described in real-time. 
However special purpose units which are designed along 
computer lines, do meet the real-time criteria. On the 
surface the word 'computer' might not seem to fit these 
special purpose machines, but a closer look will reveal 
that each has components which are the same as those of a 
computer: stored programming, memory, input, output, an 
arithmetic logic unit (ALU), an instruction set, and 
control components. Two processors which were developed at 
MIT's Lincoln Laboratory will be used to illustrate the 
state of the art in LPC voice terminals and certain 
similarities in their architecture will be evident. The 
first processor is the more flexible of the two and is 
designed to handle a wider verity of algorithms. The second 
was developed about a year later and was designed 
specifically for LPC algorithms with only minor changes. 

The first processor to be covered is the Lincoln 
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Digital Voice Terminal (LCVT) which was designed and 
constructed at the Lincoln Laboratory during the 1273-75 

i 

time frame. This processor is capable of carrying out 18 
million basic instructions per second with a 16-bit by 
15-bit multiplication taking four times as long. The 
execution time for each instruction is 165 nsec, which 
seems to conflict with the instruction rate. This is 
resolved by the pipelining of the three portions of each 
basic instruction: fetch, decode, and execute. The 
processor has separate memories for data and the program. 
The data memory capacity is 512 16-bit words and the 
program memory contains 1024 16-bit instructions. The 
pipeline instruction processing requires that the buses to 
and from the ALU be seperate and each is un i di r ect i ona 1 . 

Figure 11 shows the data paths of the LDVT (none of the 
control or timing lines are shown). There are four active 
registers: the P register which is the program counter with 
multiplexed inputs from the address portion of the 
instruction, the ALU, the sum of the X register and the 
address portion of the instruction, and itself incremented 
by one; the X register which is used for indexing memory 
addresses; the A register which is the accumulator; and the 
3 register which is actually a pair of registers used for 
i nput and output. 
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I N PUT 



FIGURE 11. LDVT DATA FLOW 

The ALU of the LDVT as shown separately in figure 12, 
has two sections: a standard programmable ALU which 
performs logical, addition anc compare operations; and a 
16-bit by 16-bit multiplier array which provides a 32-bit 
result in just 4 cycles. Either of these may be used with 
any input, however due to their common input and output 
only one may be used at a time. 

It 3s significant to note some of the requirements 
brought on by the pipelining of the instructions. The 
device does not have a main bus over which data flows in 
both directions. Generally all data flow is uni di rect i ona 1 
and in the case of the ALU input buffer registers are 
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needed to hold the data for the instruction being executed 
while the next instruction may have already read a value 
from memory and put this on the ALU input line, in addition 
to LPC algorithms at 2400, 3603 and 4800 bits per second, 
the LDVT has been programmed for adaptive predictive coding 
at 3000 bits per second and as a channel vocoder at 2400 
b i ts per second . 




Register 

FIGURE 12. LDVT ALU 

The second speech processor is the Linear Predictive 
Coding M i croproces sor (LPCM) which is disigned strictly as 
a low cost LPC terminal. The basic cycle time for this 
machine is 150 nsec. The data memory has 2K 16-bit words of 
which 1.5K is ROM and 0.5K is RAM. The program memory 
contains ll< of 48-bit words. The LPCM is almost free of 
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instruction decoding, with the only exception being the ALU 
operation. Figure 13 shows the instruction format and in 
figure 14 it is evident that parts of the instruction 
register are being input as control functions. Figure 15 is 
a block diagram of the LPCM and shows the two buses and the 
large number of registers needed to control the data flow. 

While these machines have varying degrees of 
adaptability, it does not appear that either could handle 
the additional computations described in the following 
sections wi thout major hardware modifications. However, a 
special purpose LPC code converter which could be used in 
conjunction with an existing terminal could probably be 
developed which would operate in real-time and not load the 
existing processor. 
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FIGURE 13. LPCM INSTRUCTION FORMAT 




FIGURE 14. LPCM CENTRAL PROCESSOR 
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FIGURE 15. LPCM BLOCK DIAGRAM 



V. ADJUSTMENT OF VOCAL TRACT PARAMETERS USING LPC 



One reference to voice cha racte r i st i c modification was 
found by the author ^Atal and Kauneur, 1 9 7 1 J . Although 
scaling of pitch, formant frequency and formant bandwidth 
was stated to have been accomplished, no description of the 
work was given. Other literature did provide useful 
information on formant frequencies and pitch periods which 
are typical for various speakers. It should be noted that 
there is a considerably larger variation, from speaker to 
speaker, in pitch period than in formant frequencies. As an 
example, two speakers, saying the same phoneme could easily 
have pitch periods that varied by a factor of two, yet have 
only a 10-20 per cent variation in formant frequencies. 
Different physical structure (vocal cords and the vocal 
tract) produce these speech cha racte r i st i cs (pitch period 
and formant frequencies, re spec t i ve 1 y ) and therefore their 
variation from speaker to speaker is only partially 
correl ated. 

The coded information produced from input voice by the 
LPC processor is very closely related to the physical 
structure that is producing the sound. On outDut, speech is 
recon s t ructed from the gain, pitch period and 
vo i ce/ un vo i cad parameters as well as the vocal tract 
prediction coefficients. The gain and pitch period can be 
varied as they stand but the variation of the prediction 
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coefficients is somewhat more complicated. The goal of 
varying these coefficients before reconstruction is to have 
the output voice have different pitch period and formant 
frequencies while retaining a natural sound and retaining 
the same Information, i.e. the same sequence of phonemes 
and voice inflection. 

Voice characteristics are associated with certain 
parameters of the LPC code. First, formant frequencies and 
bandwidths are associated with the LPC coefficients. The 
amplitude of the output voice is associated with both the 
gain coefficient and the formant bandwidths. The 
relationship between output amplitude and the formant 
bandwidth is due to the increased energy in the impulse 
response of a narrow bandwidth (high Q) transfer function. 
This is noted physically by the fact that speakers with 
highly resonant voices may speak louder for the same amount 
of energy expended. The pitch period is controled by the 
pitch period coefficient only. Finally, the vo i ce/ unvo i ced 
decission would normally not be changed. The exception 
would be if one was reconstructing whispered speech (the 
vocal cords are stationary) from normal speech. 

A. ADJUSTMENT OF FORMANT FREQUENCY AND BANDWIDTH 

The vocal tract model we are using has all real 
coefficients in the z-domain polynomial. Following directly 
from this is the fact that all poles must fall either on 
the real axis of the z-plane or in complex conjugate pairs. 
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Each of the complex conjugate pairs Is associated wi th one 
formant (resonator) of the speech model. The vocal tract 
transfer function is the product of these resonator 
transfer functions which are each of the following form 

1 

H (z) = -2TTCBW) T s -1 -47TCBW) T, -2 

f l-2e cos ( 2TT F T s ) z + e z 

where F is the center frequency of the formant, f , and BW 

is the bandwidth of the formant. The pole locations 

associated with this transfer function are 

z = x + j y 

This pair of poles must be moved in order to alter the 
frequency and bandwidth of this resonant section of the 
vocal tract model, but this must be done carefully so that 
the poles remain inside the z-plane unit circle, if the 
desired modification of the input speech is to reduce the 
bandwidth (increase Q) of the formants, the poles must be 
moved closer to the unit circle. If the distance from the 
center is multiplied by a constant factor, there is a 
danger of moving poles outside the unit circle and thereby 
causing instability during reconstruction. However, the 
magnitude of the pole is always less than one and may be 
raised to any positive power without danger of crossing the 
unit circle. It is shown as follows that raising the 
magnitude to a factor is equivalent to multiplying the 
formant bandwidth by that same factor. 

The transfer function with the complex conjugate poles 
above i s : 
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H(z) 



1 



-1 22-2 
l-2x z + (x +y ) z 

However with the pole locations in polar form 

x = A cos 9 Y s A sin 9 

and mak i ng use of 

2 2 
cos 9+s i n 0=1 

the eauations becomes 

1 

H(z) = -12-1 

1-2A cos0 z +A z 

Setting the terms of the characteristic equations equal we 
get 

-2TT CBW) T s 

2Acos3 = 2e co s ( 2 IT F T^, ) 

and 

2 -4TT (BW) T $ 

A = e 

when solved for A and 9 give 

-2TT (BW) T, 

A = e 

8 = 2T F T $ 

and inversely 

F =9 / 2TT T s 

BW • = (-In A ) / 2TT T 5 

If new formant character i s t i cs , F' and BW 1 , ara desired 
where 

F‘ - Jf 

and 
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BW ' = OCBW 



they may be implemented by moving the poles of the 
character i st i c equation so that 

9 ' ■ Je 

and 

In A ' = CX 1 n A 

which reduced to 




This method of implementing the pole shifts guarantees 
that no unstable poles will be created and is used in the 
following section in the realization of a LPC voice 
modification system. 

3. GAIN ADJUSTMENT 

The filter coefficients recons tructed from the 
relocated poles above may not have the same zero frequency 
gain characteristic as the filter used for inverse 
filtering during encoding. This situation can be 
illustrated graphically by the two vocal tract transmission 
characteristics shown in figure 16. 




BEFORE PROCESSING AFTER PROCESSING 
(A) (B) 

FIGURE 16. FORMANT GAIN 
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Although the formant frequencies in 15(b) are lower than 
the cor r es pond i ng frequencies in 16(a) as was desired, the 
overall gain was also changed. This would cause the 
recons tructed speech to be much softer than desired. 

A solution to this problem was to adjust the excitation 
function gain used during recons truct i on. This adjustment 
factor would be equal to the ratio of the zero frequency 
gains of the original and modified vocal tract filters. The 
vocal tract has the following z-domain transfer function. 

1 

H ( z ) = p 

x T a;Z '' 

i =1 

The above equation can be evaluated at 

jTT f./f, 

z = e 

to obtain the gain at frequency f. Evaluating the above 
transfer function at f=G yields the following equations. 

z -J = 1 

and 

1 

G ( 0 ) = P 

i =1 

This equation can be easily evaluated for both the 
coefficients of the vocal tract transfer function 
calculated from the input sequence and the coefficients 
calculated from the altered pole locations. The gain 
multiplication factor is then multiplied by the energy 
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measured in the error signal to get the excitation gain to 
be used during reconstruction. 

C. PITCH PERIOD ADJUSTMENT 

The adjustment of the measured pitch period may almost 
go without explanation except to note that if the pitch 
period is increased and all other coefficients remain 
unchanged, the output speech would be softer. This is due 
to the reduced energy (impulses less often) being input to 
the vocal tract filter and the resulting lower energy in 
the output speech. 
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VI. COMPUTER SIMULATION OF PITCH AND FORMANT MODIFICATION 



The process of pitch and formant modification was 
carried out on the IBM 360 computer with the input and 
output being accomplished on a hybrid system consisting of 
a COMCOR 5000 analog comDUter and an XDS 9300 digital 
computer. The interface between the XDS 9300 and the IBM 
360 was seven track digital magnetic tape. 'All work was 
done on five second segments to allow sufficient length for 
analysis while not using excessive computer processing 
time. 

A. VOICE INPUT AND DIGITAL SAMPLING 

The input voice was recorded on a standard single tract 
audio tape recorder at 7 1/2 inches per second (ips). 
Recording was done with a high quality microphone in a 
quiet but not sound-proof room. This digitizing was done at 
half speed to allow the digital computer to write the data 
onto tape without missing any data. This recording was 
played back at 3 3/4 ins with the output directed to an 
amplifier of the analog computer. The voice was amplified 
to a level aoprooriate for the analog computer (a +.1.00 volt 
machine). The amolifier output was passed through two 
forth-order analog filters set at 2350 Hz and 2400 Hz cut 
off freauencles. The output of the filters was then put 
into a sample and hold circuit at the inout of a 14-bit 
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analog to digital converter. The 14 bits produced were 
read by the XDS 9300 and placed in the most significant 
bits of the 24 bit XDS 9300 computer word. This process is 
illustrated in figure 17. 




FIGURE 17. DATA ACQUISITION 



The samol ing rate used was 5000 Hz. However the voice 
recording was played back at half speed and there r ore the 
equivalent lowoass filter cut off and the equivalent 
sampling rate were about 4750 and 10,000 Hz respectively. 

B. XDS 9300 OPERATION 

The operation of the XDS 9300 during the input phase 
was simply to read the data available at the output 0 ' c the 
analog to digital converter and place this data in an 
array. When an array of 1024 samples was filled it was 
written onto a seven track magnetic tape. This was done 
continuously so that no data was lost between blocks. The 
voice segment as it existed on the seven track tape 
consisted of 50 blocks of 1024 samples. Each sample was 
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recorded in a integer format ranging from +8388607 to 
-8388607 (+.( 2 **23 )-l ) , This tape was then used as the input 
to the IBM 360. 

C. IBM 360 INPUT PREPARATION 



When the 24-bit word, seven track tape created by the 
XDS 9300 was read by the IBM 360, the machine 
rep re sen tat i on of the values was not correct. This was due 
to the addition of the eight bits shown in figure 18. 



24-Bit XDS 9300 Word 



32-Bit Word Read by 
IBM 360 



Corrected IBM 360 Word 

FIGURE 18. 




The data conversion program (Appendix A.l) was used to read 
the data from the seven track tape and move the bits of 
each value as required. The program did not make the 
conversion from ones complement rep re senta t i on (XDS 9300) 
to twos complement reo re sentat i on (IBM 360) because any 
error caused would be well below the 14-bit quantization 
error. At this point the data was converted to floating 
point rep re sentat i on with values between j+100.0 and the 
average value of each sequence was calculated and 
subtracted from each data point. This insured that the 
input was a zero mean function. Each data sequence was 
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written into a separate file of a standard nine track I 3M 
360 tape for ease of further handling. 

D. SCOPE OF SIMULATION PROGRAM 

The goal of this research was to demonstrate the 
feasibility of voice modification and as a result only 
certain areas were studied. Spec i f i ca 1 1 y , all programming 
was done with the standard IBM 360 floating-point 
arithmetic, making no allowance for the effects which would 
be caused by the shorter word length and integer 
rep re sentat i on used in most voice processing systems. 
Further study of that area is warranted and would be 
especially critical in the determination of the pole 
location, which is covered later. 

The system degradation by background noise in the input 
speech was not studied except to note that the 
vo iced/ unvo i ced deciion threshold would need to be adjusted 
for a noise environment. 

Although the programs 'were written to allow variation 
in the order of the prediction, number of samples per frame 
and samDling interval, these were not varied. A 12th order 
voice tract filter was used throughout and proved to be 
sat i sfactory . The analysis frame length was 25.6 msec. 

(256 samples) and also remained unchanged. In any future 
use of these programs with a different frame length, 
attention would be required by the input format to insure 
that the analysis frame length is an integral multiple of 
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the input record length. 

Finally, in the following description of the programs 
the term 'LPC coefficients' will refer to the coefficients 
of the vocal tract model filter. The term 'LPC parameters' 
will refer to the entire set of parameters needed to 
reconstruct the output speech, i.e. the LPC parameters 
consist of the LPC coefficients, the gain parameter, the 
pitch period and the voicing indicator. 

E. LPC ENCODING 

The first step of the encoding process was to determine 
the filter coefficients. These coefficients were used in 
the inverse filter for determination of the error signal. 
The root mean square values of the input and error signals 
were compared to determine if the frame was voiced or 
unvoiced. Finally the pitch period was determined for 
voiced frames. This program is listed in Appendix A. 2. 

1 . LPC Coefficient Determination 

Determination of the LPC coefficients was done with 
the autocorrel at ion method in the subroutine named AUTO. 
First, the input data, s(n), was windowed by one of four 
available windows producing a temporary array, t(n), of the 
windowed data. 

t(n) = W(n) x s(n) 

The discrete autocorrel at ion of the temporary array was 
calculated for the discrete displacements of zero to the 
predictor order, p. 
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R ( j ) 



N-j 




i =1 



t(i) t(i+j) 



0 1 j < P 



The next step was the solution of the following matrix 
equa t i on . 



The auto correlation matrix in always positive definate, 
symetric and all values along a given diagonal are equal. 

A particularly efficient method of solution is available. 
This method is attributed to Durbin IMakhoul, 19751 and is 
implemented in subroutine COEFF. Durbin's algorithm is 
recursive and calculates the predictor coefficients for the 
Kth order from the coefficients for the (k-l)th order. The 
j th coefficient for the kth order predictor is a^.(k). The 
recursion formulas follow. 



P 




R( I I-j I ) a. = R( i ) 



j 



j-1 



1 < i ( p 



ECO) = R( 0 ) 



a . ( k) = R ( J ) 



J 




a. (j-1) R(j-i) / E ( k— 1 ) 



i=l 



1 1 j < P 



a . ( k) 



J 



a. (k-1) - a, (k) a . (k-1) 
j k k-j 



1 1 j 1 (k-1) 
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2 

E(k) = (l-a^(k) ) E(k-l) 

E(k) is the prediction order error resulting from limiting 
the predictor order to k. 

During the programming of COEFF the subroutine TEST 
was written to perform and print the results of the matrix 
multiplication. During the initial testing of the program 
various window functions were used in AUTO, however the 
prediction order error did not change significantly with 
the window function used. 

Certain researchers have noted that a lower order 
filter may be used during unvoiced speech. If this is 
desired, the coefficients for the lower order filters could 
be stored during the recursive steps of the algorithm above 
and later, when the frame is determined to be unvoiced, the 
lower order filter coefficients would be available without 
further calculation. 

The coefficients, a., used in the main program are 
the coefficients of the characteristic polynomial of the 
filter with a assumed to be unity. 

1 

H ( 2 ) = p 
i =0 

Therefore the negitive of the values calculated in COEFF 
were returned to the main program. 
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2 . Error Signal Determination 

The error signal, e(n), is determined by 
subtracting the predicted sample value, ”s(n) from the 
actual value, s ( n ) . 

e(n) = s(n) - s(n) 



s (n) 




s ( n- i ) 



P 

e(n) = s(n) + ^ a. S(n-i) 

i =1 ' 

This operation is carried out by subroutine ERR. In order 
to make a correct error determination at the begining of 
each frame, a number of samples equal to the order of the 
predictor were saved from the end of the previous frame. 
This eliminated additional error signal energy caused by 
poor begining of frame prediction and reduced the 
possibility of an incorrect voicing decision. Another 
possible solution to this problem would be just not 
analyzing the error for the first few samples of each frame 
and making the appropriate changes in the following 
routines that use the error signal. 

3 . Voicing Dec i s i on 

A comparison of input signal energy and the error 
signal energy was used to determine if a particular frame 
is voiced or unvoiced. Although the root mean square value 
of each set of data is actually proportional to the square 
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root of the energy in the signal, the root mean square 
value was used in this comparison. Whenever the root mean 
square value of the input signal divided by the root mean 
square value of the error signal was greater than a 
threshold value, the frame was determined to be voiced and 
the voicing indicator was set to one. Otherwise the voicing 
indicator was set to zero. 

^ . Pitch Period Determination 

The error signal was used in subroutine PITCH for 
determination of the pitch period of each voiced frame. 
First the error signal was passed through a recursive 5th 
order Butterworth filter with an 800Hz cut off, to smooth 
the signal. Extra samples of the error signal and filtered 
error signal were saved from frame to frame (zeroed during 
unvoiced frames) to insure a correct filtered error signal 
at the begining of each frame. The degradation of the 
system if this was not done was negligible but plots of the 
filtered error signal would have shown discontinuities at 
the begining of each frame if this had not been done. The 
frame was windowed to eliminate end effects and the 
autocorrelation function of the filtered error signal is 
calculated. The portion of the autocorrelation function 
from 12 to 180 samples was searched for peak values and the 
pitch period set equal to the location of this peak. 

Figure 19 shows a typical autocorrel ation function and the 
portion of the curve searched for the peak value. The peak 

9 

picking algorithm checked to insure that the value chosen 
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was not on the downslope of the center peak and was not a 
minor peak with a larger peak at a longer pitch period. 




Although this pitch determination algorithm worked 
satisfactorily in this program it is probably not as 
accurate and flexible as certain other, more complicated 
techniques available. It was used only for pitch periods 
from about 3 to 9 msec., but was satisfactory for them. 

F. L PC PARAMETER MODIFICATION 

The purpose of the program was to demonstrate the 
modification of voice cha racter i st i cs . The system was 
designed so that only the LPC parameters were needed to 
make the desired modifications. No other measurements of 
the input speech are needed. Of the parameters calculated 
from the input speech, only the voicing indicator remained 
unchanged. The LPC coefficients are varied as required by 
the desired formant frequency and bandwidth changes 
require. The pitch period is varied separately and the gain 
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is adjusted to correct for changes caused by formant 
bandwidth modification. 

1 . LPC Coefficient Modification 

The modification of the LPC coefficients is 
accomplished by three subroutines: POLES, ALT, and NEWCF. 
Subroutine POLES calculates the z-plane pole locations from 
the LPC coefficients. Subroutine ALT changes the locations 
of the poles according to the various scale factors 
specified by the main program. The new predictor 
coefficients are calculated by subroutine NEWCF. 

The predictor coefficients, a., are provided to 
subroutine POLES to get the p order z-domain polynomial 
which is factored into its component roots, the z-plane 
poles of the vocal tract filter. This factorization is 
done with library routine ZRPOLY which was sufficiently 
accurate and produced complex conjugate pairs which were 
exact complex conjugates. This simplified the problem 
which came up later, of separating the real poles and the 
complex conjugate pairs so that the proper scaling factor 
could be applied to each. The input polynomial had all 
real coefficients and therefore all the roots are real of 
in complex conjugate pairs. These poles are placed in a 
complex array and returned to the main program. 

The subroutine ALT was provided with the complex 
array of pole locations and it separated them into separate 
arrays of real and complex poles. Each complex conjugate 
pole pair was entered as one entry in the complex pole 
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array. The scaling factors provided to subroutine ALT 
cons i s ted of : 

(1) FSC - Formant frequency scaling factor 

(2) BSC - Formant bandwidth scaling factor 

(3) RSC - Real pole scaling factor 

(4) RUM - Real pole magnitude limit 

(5) SP - Sampling period 

The polar coordinates were determined for each pair 
of complex conjugate poles and the magnitude. A, and angle, 
0, of each were considered separately. The magnitude was 
raised to the power of the bandwidth scale factor and the 
angle was multiplied by the frequency scale factor. 

BSC 

A' = A 

9' = 9 x FSC 

The modified magnitude. A', and angle, 0', were used to 
determine the complex location and the calculated pole and 
its conjugate were put in the pole vector for output. 

During the alteration process each complex pair of poles 
was checked against a constant magnitude of 0.98 to insure 
that numerical instability or repeated impulses would not 
cause excessively large outputs. 

Each real pole was multiplied by the real pole 
scale factor and checked to insure that the magnitude was 
less than the limit prescribed. The effects of varying the 
real poles was not studied and a real pole limit of 0.95 
proved to guarantee sufficient damping of the output to 
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provide a nearly zero mean output. 

The poles from both the real and complex pole 
arrays were combined into one array for return to the main 
program. Subroutine ALT also provided graphical and 
printed output of the pole locations, before and after 
modification when this was desired. Figure 20 is an example 
of the graphical output which shows the z-plane pole 
locations before and after modification, in relation to the 
unit circle. 




FIGURE 20. VOCAL TRACT POLES 
X INPUT 

+ AFTER MODIFICATION 



Subroutine NEWCF performed the task of multiplying 
the poles to calculate the coefficients of the modified 
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characteristic equation for the vocal tract filter. This 
operation was done in double precision arithmetic because 
the predictor coefficients being calculated often differed 
by only small amounts. This process would require close 
study before this system could be implemented on a short 
word length processor. 

2 . Pitch Period Modification 

The pitch period was modified in the main program 
and consisted only of converting the pitch period (an 
integer) to floating point representation, multiplying by 
the pitch period scale factor, and reconverting to fixed 
point representation. Although changing the pitch period is 
relatively simple, a number of other changes are caused by 
modifying the pitch period. If the pitch period is 
shortened the gain must be reduced to make up for the 
increased energy being input to the vocal tract filter. 

The relationship between the pitch period and the formant 
bandwidth also requires further study. It appears that the 
formant bandwidths (Q's of the vocal tract resonators) 
should produce a impulse response which is significantly 
attenuated by the time the next impulse is input to the 
filter. There is most likely a feedback effect between the 
vocal tract resonators and the vocal cords vibration rate 
which is not considered by the model used. This effect is 
noted in the graphical output as sharp discontinuities at 
the point where each new impulse is generated. 
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3 . Gain Adjustment 

Although overall gain of the system can be adjusted 
easily at the output, the relative amplitude from frame to 
frame must be retained during the processing. The gain 
coefficient, root mean square of the error function, is 
adjusted to account for the change in the energy of the 
vocal tract impulse response brought about by the bandwidth 
changes. As was described earlier the ratio of the original 
and modified vocal tract filter gain a zero frequency is 
used to estimate the ratio of inpulse response energy. 
Although this is not strictly true, as long as the scaling 
factors are limited to those which produce realistic 
speech sounds, this appears to work very well. The zero 
frequency gain of the original vocal tract filter, G(in), 
is calculated before the LPC coefficients are modified. 



The value of both a^, and a' is unity. After the 

o o 

coefficients are modified the same calculation is performed 
aga i n. 



P 



G( in) 




a . 



P 



G( ou t ) 




The root mean square of the error signal, rms(E), is 
multiplied by the ratio to obtain the new % a i n coefficient 



rns '(E). 

rms'(E) = rms(E) x G(in) / G(out) 

G. SPEECH RECONSTRUCTION 

Reconstruction of the sampled speech waveform, from th 
modified LPC parameters is accomplished by subroutine 
RECCN. This routine not only decodes both voiced and 
unvoiced speech, but also makes allowance for the 
transition of varying parameters from frame to frame. The 
LPC parameters from the previous frame are saved between 
calls to subroutine COEFF and are used during the current 
frame when needed. It is also necessary to save output 
values from the previous frame to allow the recursive 
calculation of the output values at the begining of the 
current frame. 

1 . Unvioced Speech 

Curing continuous unvoiced speech (as opposed to 
the previous frame being voiced) the new LPC parameters ar 
used immediately upon entry to subroutine RECON. The 
excitation function is determined by calling a library 
routine GGNOF which returns normally distributed random 
numbers with zero mean and a variance of unity, and 
multiplying the value returned by the gain parameter. The 
excitation function is changed for every output sample to 
simulate the continuous excitation caused by turbulent air 
in the vocal tract. The vocal tract filter is implemented 
by the recursive addition of past values of the output to 
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the excitation function. The z-domain transfer function 

s(z) 1 

e(z) = p 

i =1 

is implemented with the discrete time function 

P 

s(n) = e ( n ) - V -1 a * s(n-i) 

L ’ 

i =1 

where s(n) is the output sample and e(n) is the excitation 
funct i on . 

2 . Voiced Speech 

During voiced speech a certain amount of continuity 
must be maintained from frame to frame. This was 
accomplished by allowing any uncompleted pulses from the 
previous frame to finish before the parameters are changed. 
Immediately upon entering the subroutine during voiced 
speech the pulse period counter is tested to see if it is 
equal to the former pulse period. If the former pulse is 
not complete the routine goes ahead and recursively 
calculates the output values. Upon completion of a pulse 
from a former frame or any pulse during the current frame, 
the new LPC parameters are used to replace the old one. 
There was a direct replacement for all parameters except 
the gain coefficient. The geometric mean of the old and new 
gain coefficients is used for the gain on the current pulse 
and the old gain replaced with the gain just calculated. 
This provides for the difference between the old and new 



gain parameters to decay exponentially but prevents sharp 
changes in amplitude from frame to frame and make the 
output speech more natural. 

3 . Transition Frames 

If the current frame and the previous frame were 
not of the same type care must be taken to insure that all 
parameters are changed together. If LPC coefficients for 
unvoiced speech were used with a pulsed output an unnatural 
sound would be likely to be produced. During the 
transition from unvoiced to voiced speech, the retained 
values from the previous frame are normally small in 
comparison to the amplitude of the pulsed excitation 
function. Therefore the voiced speech production may begin 
immediately. When the opposite is true, the large 
amplitude samples near the begining of a output pulse are 
significantly larger than the unvoiced excitation values. 
Therefore whenever unvoiced speech follows a voiced frame, 
the previous output pulse is allowed to finish. The 
damping that occurs during the voiced pulse normally 
reduces the magnitude of the samples near the end of the 
pulse to the point where they will not interfere with the 
unvoiced speech to follow. 

H. OUTPUT PROCESSING 

The reconstruced speech samples are output onto a 
standard nine track IBM 360 magnetic tape. These values 
were later input to a data conversion program (Appendix 



A. 4) which converted the floating point values to integers 
which were in the proper format for the XDS 9300 and within 
an appropriate range for the XDS 9300's digital to analog 
converter. The necessity of using a seven track tape for 
data transfer still existed, so the significant bit of the 
integers had to be shifted into the proper position so that 
none of the eight bits dropped during the writing of each 
value onto the seven track tape would effect the data. 

This tape was input to the XDS 9300 which via the digital 
to analog converter made the samples available on the 
CQMCOR 5000 in analog form. 

These samples were output at a rate of 5000 per second 
thru a sample and hold circuit. Again two lo w pass filters 
were used to remove the time quantization noise from the 
samples. The analog waveform was recorded at 3 3/4 i ps on a 
standard tape recorder which could be played at 7 1/2 i ps 
to hear the reconstruced speech. 

I . GRAPH I CAL OUTPUT 

The programs described above were also able to produce 
a varity of graphical outputs to assist the researcher in 
following the signals through the LPC processing. The 
waveforms available from these programs are: 

( 1 ) Input speech 

(2) Error signal before filtering 

(3) Error signal after filtering 

(4) Reconstructed output speech 
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The z-plane pole locations determine the formant 
frequencies and bandwidths and were also available for 
graphical display. A seperate program (Appendix A. 3) was 
written to display the logarithmic power spectral density 
of the input and output speech for a number of consecutive 
frames and proved useful in analysis of the output quality. 
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VII. RESULTS 



The desired result of this study was the recons tructi on 
of speech at different pitch and formant frequencies than 
that of the input speech. The complete process of 
encoding, modification and decoding was accomplished for 
three 5-second segments of speech. Upon completion of the 
process most listeners agreed that although the input 
speech was female, the modified output speech sounded 
typically male. Although the audio output was somewhat 
lacking in quality it was intelligible. 

Examples of the printed and graphical computer output 
are given in Appendix B. Two examples are completely 
covered. The first 384 msec, segment (15 frames) is of the 
vowel 'e' and the second segment is of the transition from 
a fricative to a voiced sound, 'sa', from the begining of 
the word salt. Both were derived from a recording of a 
female speaker were recons true ted first without 
modification and then with mod i f i caat i ons which consisted 
of reduction of the pitch frequency by a factor of 0.58 and 
reduction of the formant frequencies by a factor of 0.88. 
First the input waveform with the logarithmic power 
spectral density plot of that portion of the speech is 
given. Examples of the printed processing summary are next 
and are followed by the waveforms of the error signal and 
the filtered error signal. Plots of the vocal tract pole 
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locations are shown with the poles at input supe r i mposed on 
the poles after modification. Finally, speech waveforms for 
both unmodified and modified output with their respective 
logarithmic power spectral density functions are displayed. 
The audio output is available from the author on request, 
in the form of an audio tape recording. This tape 
recording is described in detail in Appendix C. 

The results above demonstrate the feasibility of the 
use of linear predictive coding as a technique for voice 
modification. This research also indicated areas in which 
further study and improvement may be made. Some of these 
areas are: 

(1) The effect of noise during voiced speech on the 
prediction error and on the gain calculated from 
the error. It may be possible to use only the 
energy occuring at the peaks of the error signal 
and thereby attribute the remainder of the error 
signal as being due to noise. 

(2) The effect of the use of different window 
functions in autocorrelation function calculation 
and how this variation effects pitch period 
determination and the voicing threshold. 

(3) The possibility of constructing a LPC 
processing system with asyncronous clocks for the 
frame timer and the output sample gereration. This 
would produce a very similar effect to that 
accomplished here, but probably at a reduced cost. 
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VIII. CONCLUSIONS 



With the refinement and standa rd i za t i on of LPC 
commuication processors, the ratio of processing time to 
real time for unaltered communication is expected to drop 
below the current 65%. The available computation time may 
be used for the pitch and formant alteration described 
above or for other modification which can be accomplished 
at either the transmitting or receiving processor and still 
allow real time voice communications. 

A number of possible applications of the speech 
frequency cha rac te r i s t i c modification described are: 

(1) A digital hearing aid for persons (such as the 
author) with high frequency hearing loss. 

(2) Radios in military vehicles which would produce 
speech in a frequency range different than the 
range of the predominant noise in the vehicle, i.e. 
low pitch voice in turbine ai rcraft with high 
frequency noise and high pitched voice for 
helicopters and tanks where low freauency noise is 
most prevalent. 

(3) Voice channel jammers which would produce 
random phonemes with pitch and formant 

cha rac te r i s t i cs similar to the current users of the 
channe 1 . 

As LPC communications systems become common because of 
their low data rate requ i rements , the use of the LPC 
parameter modification will be desired to extend the 
flexiblity of voice communication and storage systems. 
Frequency modification is one viable process available. 
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APPENDIX A.l 



SEVEN TRACK TO NINE TRACK TAPE CONVERSION 
PROGRAM 



DIMENSION IDATU024),DAT( 53248 ) 

FACTOR = 100.0/(2.0**23) 

REWIND 2 
REWIND 4 
N= 1 C24 
K= 0 

5 K = K+1 
IF(K.GT.13) STOP 

6 BSU M = 0.0 

7 J = 0 

8 J=J+1 

IF ( J .LE.50 ) GO TO 10 
12 READ(2»15 ,END=200) IDAT 

C-0 TO 12 

10 REAC(2,15 ,END=200,ERR=o0) IDAT 

15 F3RMAT(128( 8A4) ) 

17 CALL FORM (I DAT »N) 

JJ=( J-l ) *1024 
SUM =0.0 
DO 20 1=1,1024 
II = I+JJ 

DAT (II) = FLOAT(IDAT( I ) )*FACTOR 
SUM = SUM + D AT ( II ) 

20 CONTINUE 

SUM = SUM/1Q24. 

WR I TE(6 ,2 5 ) J,K 

25 F3RMAT( 40X, • * RECORD ',13,' OF FILE ',13, 

* * HAS BEEN READ *• ) 

IF (J.LE.l) W P ITE ( 6 ,30 ) K ,SUM , ( OAT ( L ) , L=1 , 1 024 ) 

30 FORMAT! ' FILE =',I3,' AVG = • ,E16 . 7 //( IX , 3E14 .7 ) ) 
IF (J.LE.l) WRITE(6,31) IDAT 

31 FQRMAT(1X,8I15) 

BSUV = BSUM + SUM 

90 GO TC 9 

2C0 J=J-I 

WR ITE (6,2 05 ) K, J 

205 FORMAT ( ' END CF FILE ',13,16, 

* ' RECORDS HAVE BEEN READ') 

8SUM = BS UM/ FLOAT ( J ) 

DO 55 J=i, 51200 
DAT ( J ) = DAT( J ) — B SUM 
95 CONTINUE 

WR I TE( 4 ,9 8 ) ( QAT( L) ,L = 1 ,51200) 

93 FORMAT ( 128A4) 

ENCFILE 4 

WRITE (6, 30) K,B$UM, (DAT (L) , L=1 ,1024) 

100 GO TO 5 

60 WRI TE ( 6 ,6 5 ) K 

65 FOR ma T { * ** ERR FILE', 13,' ** ' ) 

GO TO 17 
END 
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APPENCIX A .2 



LINEAR PREDICTIVE CODING AND VOICE 
MODIFICATION PROGRAM 



LINEAR PREDICTIVE CODING AND SPEECH MODIFICATION 
PROGRAM 

SAMPLED SPEECH IS INPUT VIA FILE FT02FCC1 (TAPE OR 
DISK) IN FORMAT 128A4 FOR EFFICIENT STORAGE 

SPEECH IS ENCODED INTO LPC CONSISTING OF PITCH PERIOD 
UPP), VOICED/ UNVOICED DECISION (IVF), GAIN FACTOR 
<RMS=)* AND LPC COEFFICIENTS ( A( I ) ) 

MODIFICATIONS TO CHANGE POLE POSITIONS MAY BE SPECIFIED 
SAMPLED SPEECH IS RECONSTRUCTED ANO CUT PUT ONTO FILS 
FT03FQ01, ALSO IN 128A4 FORMAT 

PROGRAMMED BY G.T.HALL, 1 978 

DI ME NS I ON X (2 56), A (14), XX (14 ) , E( 256 1 ,XO( 256 i 
DIMENSION EF( 256) , E S< 5 ) , EFS ( 5 ) , Z£ RC (2 5 6 ) 

COMPLEX P (14) 

OATA XX,EFS,ES, ZERO/28 0*0. 0/ 

SET VOICE/UNVOICE THRESHOLD 

THRESH = 2.05 
IWIN = 1 

SET QRCER OF PREDICTOR 
IP = 12 

PLOTTER OUTPUT 

1= INPUT 2=E RROR SIGNAL 

3 = F IJ.7ERE0 ERROR 4=0UTPUR 

5=P0LE LOCATION (FIRST NPLPLT FRAMES) 

IXPLT = 5 
NPLPLT = 10 

SET I WRXX=1 FOR PRINTED RESULTS FROM SUB 

IWR = 1 
IWRERR = 0 
IWRAUT = C 
IWRALT = 1 
IWRPP = 0 
IWRPOL = 0 
IWRNC = l 

.SET MODIFICATIONS DESIRED 

(FSC ) FREQUENCY SCALE COEFF 
(BSC) BANDWIDTH SCALE COEFF 
(PSC) PITCH PERIOD SCALE COEFF 
( RSC ) REAL POLE SCALE COEFF 
(RLIM) REAL POLE MAGNITUDE LIMIT 
(SP) SAMPLING INTERVAL 



FSC = 
BSC = 
PSC = 
RSC = 
RLIM = 



0. 38 

0 .63 

1. 73 

1 .0 

0.95 



SP = C.00 01 
SET NUMBER OF SAMPLES PER FRAME 
N = 256 



256 
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c 

c 

c 

c 



1 

2 



15 

C 

c 

c 



20 

C 

C 

C 



21 

C 

C 

c 



22 



23 

C 

C 

c 

c 

c 

c 



25 

C 

C 

C 



30 



40 

c 

c 

c 



41 

42 
C 
C 
C 



SET NUN BER OF FRAMES ( NFRAME ) AND NUMBER OF 
FRAMES SKIPPED BEFORE FIRST ANALYZED 

NFRAME = 15 

I Skip s 28 

IF ( ISKIP .LE.O ) GO TO 2 
00 1 L = 1, ISKIP 

READ ( 2, 1 5, END = 999) (X<J),J=1,N) 

CONTINUE 

IF ( IXPLT. LT. 5. AND, IXPLT. GT.O) CALL VPLTIN(N) 

DO 200 I = ItNFRAME 

READ (2,15, END = 999) (X(J),J=1,N) 

FORMAT! 12 8A4) 

IF (IXPLT. EQ.l) CALL VPLT(X) 

DETERMINE RMS VALUE OF SPEECH SAMPLES 

CALL RMS ( X » N, RMSX ) 

IF (IWR.EQ.1) WRITE (6,20) I,RMSX 

FORMAT ( * 1 FRAME • , 1 4 //IX, • RM S VALUE OF SAMPLES = 

* F 18. 8) 

DETERMINE PREDICTOR CGEFF BY AUTOCORRELATION METHOD 

CALL AUTO (X,N,A, IP , IW I N, IWRAUT ) 

IF ( IWR.E3 .1 ) WR IT E ( 6 ,21 ) ( ( J, A( J) ) , J=1 , I P ) 

FORMAT! /IX, 'PREDICTOR C GEFF I C I ENT S ' / ( 1 OX, I 3 , 1 X , FI 3 . 8 ) ) 

DETERMINE ZERO p REQ GAIN GF VOCAL TRACT TRANS FCN 

GIN = 1 .0 
DO 22 J = 1,.IP 
GIN = GIN + A (J ) 

CONTINUE 
GIN = 1.0 /GIN 

IF (IWR.EQ.1) WR IT E ( 6, 22 ) GIN 
FOR NAT ( /' G IN =' ,F10.5 ) 

DETERMINE POLES OF CHARACTERISTIC EQUATION 

CALL POLES ( A, IP ,P ,1 WRPOL ,ICK) 

INVERSE FILTER SAMPLES TO GET ERROR SIGNAL 

CALL ERR (X, N, A, IP , E,XX ) 

IF ( IXPLT .5 Q. 2 ) CALL VPLT(E) 

IF ( I WR ERR. EQ.l) WRITE (6,25) (E(J),J = X,N) 

FORMATtlX ,10F12.4 ) 

DETERMINE RMS VALUE OF ERROR 

CALL RMS ( E,.N ,RMSE ) 

IF (IWR.EQ.1) WRITE (6,30) RMSE 

FORMAT! /I X, ' RMS VALUE OF ERROR = ',F18.8) 

RATIO = RM SX/RMSE 

IF (IWR.EQ.1) WRITE (6,40) RATIO 

FORMAT! /I X,* RATI 0 SAMPLE RMS TC ERRCR RMS = »,F18.8) 
TEST IF VOICED CR UNVOICED 
IV F = 0 

IF (RATIO. GE. THRESH ) IVF = 1 
IF (IVF. EQ.l) WRITE (6,41) 

FORMAT (/ ’ THIS FRAME IS VOICED'/) 

IF (IVF.EQ.O) WRITE (6,42) 

FORMAT(/* THIS FRAME IS UNVOICED'/) 

IF UNVOICED BYPASS PITCH DETECTION 

IF (IVF.EQ.O) GO TO 45 

CALL PITCH (N,E,EF,ES,EFS,IPP,IWRPP) 
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IF (IXPLT. EG. 3) CALL VPLKEFi 
GO TO 49 
C 

C IF UNVOICED ZERO SAVED POST FILTER ERROR 
C 

45 CO 46 J = 1,5 
EFS(J) = 0.0 

46 CONTINUE 

IF ( IXPLT . EQ.3 ) CALL VPLT(ZERO) 

C 

C DETERMINE NEW PITCH PERIOD 
C 

49 IPPN = IFIX( FLOAT! I PP) *PSC+0. 5) 

C 

C ALTER POLE LOCATIONS 
C 

IF (I.EQ.l .AND. IXPLT. EQ. 5 ) CALL PLOTS! IA, IB, IC ) 
IFU.EQ.NPLPLT.AND. IXPLT.EQ.5) IXPLT=0 
CALL ALT2 ( F SC, 8S C, RSC ,RLIM , SP , I P , IWRALT , I XPLT) 
WRITE(6,51) IPPN 

51 FORMAT! /' PITCH PERIOD AFTER MOD I F ICAT I ON * , 1 3 ) 

C 

C CALCULATE NEW PREDICTOR COEFFICIENTS 
C 

CALL NEWCF(IP,P,A,IWRNC) 

DO 5C J = 1, 1 P 
JJ = J+N-IP 
XX < J ) = X ( JJ) 

50 CONTINUE 
C 

C DETERMINE ZERO FREQ GAIN CF VOCAL TRACT TRANS FCN 
C 

GOUT =1.0 
DO 52 J = 1,-IP 
GOUT = GOUT +A (J ) 

52 CONTINUE 

GOUT = l.O/GOLT 

IF (IWR.EG.l) WRITE!6,53) GOUT 

53 FORMAT ( / 1 G OUT =»,F10.5) 

C 

C ADJUST OUTPUT GAIN 
C 

RMS E = RMS E*G IN / GOUT 

CALL RECON ( A, IP, RMS E,IVF, IPPN,N,XC) 

IF (IWR.EQ.l) WRITE (6,54) (XO(L)tL = 1,N) 

54 FORMAT(/' OUTPUT S AMPL ES 1 /( IX , 1 0F13 .5 ) ) 

IF (IXPLT. EQ. 4) CALL VPLTIXO) 

WR ITE (3, 15 ) ( XQ( J ) ,J = 1,N) 

20 0 CONTINUE 

999 IPEN = 999 

CALL PLOT (A, B, IPEN ) 

STOP 

END 
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SUBROUTINE AUTO < S , N, A, IP, I WIN, IWR ) 

C 

C DETERMINE LINEAR PREDICTION COEFFICIENTS FOR A SET OF 
C INPUT SAMPLES USING THE AUTOCORRELATION METHOD 
C 

C S = VECTOR OF INPUT SAMPLES 
C N = NUMBER OF SAMPLES 
C A = VECTOR OF PREDICTOR COEFFICIENTS 
C IP = NUMBER OF PREDICT CR CCEFF ( ORDER CF MODEL ) 

C IP.LT.17 

C 

C IWIN = TYPE OF WINDOW ( SEE SU3R WINDOW ) 

C IWR = 0 NO PRINTING OF PREDICTION COEFFICIENTS 

r 

C REF: MAKHOUL: LINEAR PREDICTION 
C PROC IEEE, APR 75 

C 

DIMENSION S (1 ) , T < 5 1 2 ) ,R(16> ,A(1 ) 

CALL WINDW ( S»T,N, I WIN) 

C 

C CALCULATE AUTOCORRECATI CM 
C 

RO = 0.0 
DO 10 1=1, N 
RO = RO + T ( I )**2 
10 CONTINUE 

DO 3 C J = !,IP 
SUM = 0.0 
NN = N-J 

DC 20 1=1, NN 

SUM = SUM+T ( I ) *T (I+J ) 

20 CONTINUE 

R ( J ) = SUM 

30 CONTINUE 

IF ( IwR .EQ .1 )• WRITE! 6,31) RO , ( R (L ) , L = i , I P ) 

31 FORMAT (/IX, r AUTOCOREL V ALS • , F16 . 5/1 X, 3F16 . 5 /l >, EF16 . 5 ) 
C 

C SOLVE matrix EQN FOR A VECTOR 
C 

CALL COEFF(RO,R,IP, A, IWR) 

C 

C TAKE NEGITIVE OF PREDICTOR COEFF TO GET 
C CQE C F OF CHARACTERISTIC EQN OF FILTER 
C 

DO 60 I = 1 .IP 
A( I ) = -A ( I ) 

60 CONTINUE 

IF (IWR.NE.O) WRI TE ( 6 , 70) ( ( I , A ( I ) ) , 1=1 , 1 P ) 

70 F0RMAT(/1X, 'PREDICTOR C CEFF IC IEN T S * / ( 10X , I 3 , IX , FI 3 . 8 ) ) 
RETURN 
END 
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SUBROUTINE COEFFtRO »R»N«A,IWR) 

C 

C SOLVES THE MATRIX EQUATION PR A = R 
C 

C RR = AUTOCORRELATION MATRIX 



C RR = R(O) Rd ) R (2 ) R (N-l ) 

C R ( 1 ) R( 0 ) R ( 1 ) R ( N— 2 ) 

C R (2) R ( I ) R ( 0 ) R (N — 3 ) 

C R(N-l) R ( N— 2 ) R ( N-3 ) R(O)* 

C 



C R = AUTOCORRELATION VECTOR 
C R = R( 1 ) 

C R ( 2 ) 

C R ( 3 ) 

C 

C P ( N) 

C 

C A = VECTOR OF PREDICTOR COEFF 
C A - A ( 1 ) 

C A ( 2 ) 

C A (3 ) 

C 

C A { N ) 

C 

C METHOD ATTRIBUTED TO DURBIN AS DESCRIBED IN 
C ‘LINEAR PREDICTION' BY MAKHQUL,PROC IEEE APR 75 
C P. 566 

DIMENSION AKI20), AO (20) ,A(20),R(2C) 

P I R ST ITERATION 
£0 = RO 

AK ( I ) = R(1)/E0 
Ml) = AK ( 1 J 
E = <1.0-AK('1)**2)*E0 
EO - E 

AO ( I ) = Ad) 

FOLLOWING ITERATIONS 

DO 100 I = 2 » N 
IM1 = I-i 
SUM = 0.0 

DO 20 J = 1 » I M 1 
iVJ = I-J 

SUM = SUM+R( IMJ ) *AQ< J) 

CONT INUE 

AK { I ) = ( R(i )-SUM)/EC 
A( I ) = AK ( I) 

CO 30 J = It IM1 
IMJ = I-U 

A ( J ) = A0( J)-AK( I)*AQ(IMJ) 

CONTINUE 

E = (l.O-AKCI ) **2 ) ^EO 
EO = E 

DO 50 J = 1,1 
AO ( J ) = A ( J ) 

CONT INUE 
CONTINUE 

C PRINT E (REMAINING ERROR DUE TO LIMITING 
C ORDER CF APPROXIMATION) AND A CHECK CF SOLUTION 
C IF OESIRED 
C 

IF(IWR.EQ.l) WRITE ( 6 tlOl) E 
101 FOR MAT ( ‘ SUB COEFF E= ’,F18.8) 

IFdWR.EQ.il CALL T EST ( A t RO , R , N ) 

RETURN 

END 



C 

C 

c 



c 

c 

c 



20 



30 



50 

100 

C 
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SUBROUTINE TEST (A,R0,R,IP) 

C 

C MULTIPLIES PREDICTOR CCEFF VECTOR 
C A BY THE AUTOCORRELATION MATRIX RR AND CHECKS 
C THE VALUE AGAINST THE AUTOCORRELATION VECTOR 
C TO INSURE ACCURATE SOLUTION. 

C 

DIMENSION AlIF)tR(IP) 

DO IC I = 1,IP 
SUM = 0.0 

DO 9 J = 1 ,IP 
L = IABS(I-J) 

IF(L.EC.O) SUM = SUM+A(J)*RO 
IF(L.NE.O) SUM = SUM+A(J)*R(L) 

9 CONTINUE 

WRITE (6,1 5 I I , R ( I ) * SUM 

15 FORM AT ( ' R ( * , 12 ,' ) = *,2Ei4.4,’ = SIMM 

10 CONTINUE 
RETURN 
END 



SUBROUTINE POLES ( A , IP, P, I WR, ICK ) 

CALCULATES POLES OF CHARACTERISTIC ECN FROM 
PREDICTOR COEFFICIENTS AND IF WANTED PRINTS 
OR PLOTS THOSE POLES 

A = VECTOR OF PREDICTOR COEFFICIENTS 
IP = NUMBER CF CCEFF ANC POLES 

COEFF AO IS ASSUMED TO BE 1.0 
P = COMPLEX VECTOR OF POLE LOCATIONS 
I WR = 0 NO PRINTING OF POLES 
ICK = 0 ALL POLES INSIDE UNIT CIRCLE 
= 1 POLE OUTSIDE UNIT CIRCLE 

DIMENSION A(l) ,8(21) ,X(20> ,Y( 20) , NAME (20) 

COMPLEX P (1 ) 

3(1) = 1.0 
CO 10 1=1, IP 
II = 1+1 
3(11) = Ail) 

10 CONTINUE 

II P = I P+1 

CALL ZRP0LY(3,IP,P,IER> 

IF ( IWR.NE .0 ) WR IT E ( 6,20 ) ( ( I ,P( I) > , 1=1, 1 P ) 

20 FORMAT! //10X,« POLES CF CHAR E ON • / ( 10X , 13 , IX , 2 El A .7 ) ) 
ICK = 0 

00 30 I * 1,1 P 

IF (CABS(P( I) ) .LE.1.0) GO TO 30 
ICK = 1 

IF(IWR.NE.O) WR ITE ( 6 » 25 ) I 
25 FORMAT! 20X, * POLE NUMBER ',13, 

* ' ABOVE IS OUTSIDE UNIT CIRCLE*) 

30 CONTINUE 

RETURN 
END 
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SUBROUTINE ERR ( S , N , A , I P, 6 , S X ) 

C 

C DETERMINE AN ERROR VECTOR OF DIFFERENCE 
C BETWEEN ACTUAL SAMPLE VALUES AND THE 
C VALUES PREDICTED FROM PAST SAMPLES. 

C 

C S = VECTOR OF SAMPLES 
C N = NUMBER OF SAMPLES 
C A = VECTOR OF 'PREDICTOR COEFF 
C IP = NUM8ER OF PREDICTOR COEFF 
C E = VECTOR OF € R ROR VALUES 
C SX = EXTRA SAMPLES ( IP OF THEM ) 

C SAVED FROM LAST FRAME 

C 

C THE ERROR IN THE DIFFERENCE BETWEEN THE 
C CURRENT SAMPLE AND THE WEIGHTED SUM OF 
C ThE LAST IP SAMPLES. 

C 

DIMENSION S (1 )« A( 1 ) , E (1 ) , T ( 542 ) , S X ( 1 ) 
DO 10 1=1, IP 
T( I ) = SX (I ) 

10 CONTINUE 

03 20 I =1 ,N 
Td + IP) = S ( I ) 

20 CONTINUE 

DO AC 1=1, N 
SUM = 0.0 

DO 30 J=i , IP 

II = I+J-l 

JJ = IP-J+1 

SUM = SUM+TUI ) *A( JJ) 

30 CONTINUE 

ECI* = S(I ) + SUM 
AO CONTINUE 

RETURN 
END 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 



c 

c 



10 



15 

c 

c 

c 



c 

20 

30 

c 

c 

c 

33 



40 

41 



SUeROUT IN E P IT CHIN, E, EF, ES,EF S, IPP, IWR) 

DETERMINES PITCH PERIOD (IN NUMBER OF SAMPLES) 
FROM THE ERROR SIGNAL OF INVERSE FILTERED SPEECH 

N = NUMBER OF SAMPLES 

E = ERROR VECTOR 

EF = FILTEREO ERROR VECTOR (OUTPUT) 

ES = FIVE SAVED ERROR SAMPLES 

EF S = FIVE SAVED FILTERED ERROR SAMPLES 

IPP = PITCH PERIOD (OUTPUT) 

IWR = 1 FOR PRINTING DURING SUBROUTINE 



01 MENS I ON ES (5) ,EFS (5 ), EU) , EF ( 1 ) , R ( 256 > 
DIMENSION XI ( 261) , X0(261) 

FORM FILTERING VECT0R(N+5) 

DO 10 1=1,5 
XI ( I) =SS( I ) 

X0< I ) = EFS ( I ) 

CONTI NUE 

I TEMP =N+5 

DO 15 1=6 , ITEMP 

11=1-5 

XI ( I ) = E( I I ) 

CO NT I NUE 

DO 20 1=6, ITEMP 

8 UTTER WO RT H DIGITAL FILTER CUTOFF AT 800 HZ 



XO(I) = 0 . 447451239E- 3*XI ( I ) + C. 2227 2 56 2E- 2* XI ( 1-1 ) 

* *0.44745124E-2*XI( 1-2 ) +0 . 44745123 9E-2 *XI (I- 

* +0. 222725o2E-2*XI (I-4)+0. 447 451239E-3 *XI ( I- 

* +3 .41 07723 1*X0 ( I- 1 )- 4 .732 3 C £ 37*X0 (I - 2 ) 

* +3.42533523^X0(1— 3) -1.249 29 545*X0( 1-4 ) 

* +0.185257941^X0(1-51 

EF ( 1—5) =X0( I ) 

CONTINUE 
DO 30 1=1 ,5 
ES ( I)=E( I+N-5) 

EFS ( I )= EF ( I+N-5 ) 

CONTINUE 
IW IN = 4 

CALL WINDW(EF»XO»N, IWIN) 

CHECK FOR PEAKS 1.2 TO 13. C MSEC 



I TE MPC=N— 56 

IF(JWR.EQ.l) WRITE( 6,33) ( ( EF ( L ) , XQ ( L ) ) , L =1 , N ) 

FORMAT (IX ,10F13.5 ) 

DO 5 C 1 = 1,1 TEMPO 
SUM=0 .0 
ITE HPA=N-I 

DO 40 J=1 , ITE MPA 
SU M=SU M+XO (J )*XO(J+I ) 

CONTINUE 
R ( I ) = SUM 

IF(IWR.EQ.l) WRITE (6,41 ) I,R(I) 

FORMAT ( ' FILTERED ERROR AUTOCORRELATION FOR’, 14, 
* F13.3) 

IFU.LT.33) GC TO 50 
ITE ST=I-2 5 

IF (R ( ITEST) ,LT .0 .0 ) GO TO 50 
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ITEMPB= 1-34 



CO 45 J=ITEMPB, I 

IF ( RU TEST) .LT.R (J)) GO TO 50 



45 


CONTINUE 










ipp-= itest 

WRITE! 6,46) 


I FP 






46 


FORMAT (' PITCH 


PER 100 


IS', 14) 




RETURN 








50 


CONTINUE 
IPP=100 
WR I TE ( 6 ,55) 








55 


FORMAT! ///’ 


SUB 


PITCH 


FAILED TO DETERMI NE 



* /» PITCH, PITCH P ER 100 SET EQUAL TO 100’///) 
RETURN 
END 
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SUBROUTINE ALT2 ( P , FSC , 6S C, RS C , RL I M , SP ♦ I P , IWR , IXPLT ) 
C 

C GIVEN IP COMPLEX POLES OF the VOCAL TRACT 
C TRANSFER FUNCTION, CALCULATES THE FORMANT 
C FREQUENCIES AND BANDW IDTHS AMD SCALES THEM 
C AS DESIRED. PRINTED OUTPUT IS AVAILABLE. 

C 

C P = VECTOR OF IP COMPLEX POLES 
C FSC = FREQUENCY SCALE FACTOR OUT/IN 
C RSC = REAL POLE SCALE FACTOR 
C RUM = REAL POLE MAGNITUDE LIMIT 
C BSC = BANDWIDTH SCALE FACTOR OUT/IN 
C IP = NUMBER OF POLES 
C SP = SAMPLE PERIOD IN SECONDS 
C IWR = C NO OUTPUT PRINTED 
C 1 PRINTED RESULTS 

C IXPLT = 5 FOR PLOT OF POLES 
C 

DIMENSION FORFQ4) , EWU4) 

COMPLEX P (1) ,CPP(14) ,CRP<14) ,CTSM 
DIMENSION XP ( 6 J ,YP ( 6) , I IPEN( 6) 

DATA XP/3 .0,2 .75, -2 .75 ,0.0,0 .0, 2 .5/ 

DATA YP/10. G,C. 0, 0 . 0,2 . 75 ,-2. 75 ,0.0/ 

DATA IIPEN/-3, 3, 2, 3,2,3/ 

ZERC=0. 0 

IF ( IXPLT. NE .5 ) GO TO 9 
C 

NPEN = 3 

CALL NEWPENINPEN) 

DO 2 1=1,6 

CALL PLOT ( XP( I ) , YP ( I) , 1 1 PEN (I ) ) 

2 CONTINUE 

I PEN = 2 
C 

DO 4 1=1, 241 
TEM = 0.0261 8*FL0AT (I ) 

XX = 2.5 * COS(TEM) 

YY =2.5 * SINtTEM ) 

CALL PLOT (XX, YY, IPEN) 

CONTINUE 

IPEN = 3 

CALL PLOT ( ZERO, ZERO, IPEN ) 

HI EG = 0. 25 
ANG = 0.0 
NC = -1 
ITEXT = 4 
NPEN = 4 

CALL NE WP EN (NPEN) 

CO 5 1=1, IP 
XX = 2.5 * R£AL( P(I ) ) 

YY = 2.5 * A IMAG( P ( I ) ) 

CALL SYMBOL(XX,YY,HI EG, IT EXT , AN G, NC ) 

CONTINUE 

IRP = 0 
ICP = 0 

TEST EACH POLE AND PLACE IN PROPER ARRAY 
DO 40 1=1 ,IP 

IF(AIMAG( P( I) J.EQ.O.O) GO TO 30 
IF(ICP.EQ.O) GO TO 20 
DC 10 J=l, ICP 

IF(CA3S( P( I)-CONJG(CPP(J ))) .LT. 0.001) GO TC 40 
10 CONTINUE 

20 ICP=ICP+i 

CPP( ICP)=P ( I ) 

GO TO 40 
30 IRP=IRP+! 



86 



CRP ( IRP ) = PM) 

40 CONTINUE 

C 

C C A LCJ LATE FORMANT FREQ AND BANDWIDTH FOR EACH 
C 

DO 50 1=1, ICP 
A=CABS ( C P P < I) ) 

BW (I)={0.0— ALOG(A) ) / ( 6. 2831 352* SP ) 

TH=ATAN2 ( AIMAG(CPP ( I ) ) , REAL (CPP( I ) ) ) 

TH = A3S ( TH ) 

C DR F ( I )= TH/(SP*6.2 831852) 

50 CONTINUE 

ICPM1=ICP-1 
DO 60 1=1 , 1 C P Ml 
IP1 =1+1 

DO 55 J=IP!,ICP 

IF(FORF( I) .LT.FORF( J )) GO TO 55 
TEM=BW (I ) 

9W ( I ) =Bw( J ) 

8W (J } = TEM 
TEM=FORF< I ) 

FORF ( I )=FORF ( J ) 

FORF( J )=TEM 
CTE M=C FP(I I 
CPP( I) =CPP< J ) 

CPP(J) =CTEM 
55 CONTINUE 

60 CONTINUE 

IF( IWR.EQ.l-) WR IT £(6,70) ( ( I , CP o ( I ) , FOR F ( I ) , B M I )) , 

* 1=1, ICP) 

70 FORMAT <* FORM ANT* , 13, ' DUE TO POLES AT Z=' t F8.4, 

* ' +-J* • ,F8.4, • FORMANT FR£Q= ' , F8 .1 » ' BANDW IDT F= * , F 8 .1 ) 
IF (IRP.EQ.O) GO TO 35 

IF(iWR.EQ.l) WR IT E ( 6 ,80 ) ( ( I, CRP ( I ) ) , I = 1, IR P ) 

80 FORMATC REAL POLE NUMBER ’ ,I3t’ AT Z =',2F8.4) 

85 IF( IWR.EQ.l) WR ITE ( 6 , 90 ) FSC , 3SC , RSC ,RLI M, SP 

90 FORMAT(/' FORMANT FREQUENCY SCALE FACTOR =',F8.4, 

= ' 3 ANO WIDTH SCALE FACTOR =' ,F8.4/ 

* ' REAL POLE SCALE FACTOR =',F8.4, 

* ' REAL PCLE MAGNITUDE LIMIT = ',F8.4, 

* ' SAMPLE PERIOD = ’ , F 9. 6 / /' AFTER MODIFICATION’) 

C 

C ALTER FORMANT FREQUENCIES ANO BANDWICTHS 
C 

CO 100 1=1, -ICP 

A=C A8S( CP P ( I ) ) SC 

IF( A .GT .0.93 ) 4=0.99 

TH = AT AN2 ( AI MAG(CPP(I l)» REAL (CP? < I ) ) )*FSC 
TH=ABS( TH) 

CPP (i)=A*CMPLX(COS (TH), SIN(TH) ) 

BW ( I ) = ( 0. 0— ALGG ( A ) )/(6. 28 31352 *SP) 

FOR F( I ) =T H/ ( 6 .2831 85 2* SP) 

100 CONTINUE 
C 

C ALTER REAL POLE LOCATIONS 
C 

IF (IRP.EQ.O) GO TO 115 
DO 110 1=1, I.RP 
CRP ( I )=CRP( 1 1 *RSC 
, TEM=CA8 S( CRP ( I ) ) 

I F { T E M . GT .RLIM) C RP (I )= CRP ( I ) *RL IM/T EM 
110 CONTINUE 

115 IF ( IWR.EQ.l) WRI TE( 6,70 ) ( ( I , CP P < I ) ,FORF ( I ) , B W ( I ) ) , 

* 1=1, ICP) 

15= ( IRP.EQ.O) GO TO 113 

IF ( IWR.EQ.l-) WR IT E ( 6 ,30 ) ( ( I, CRP ( I ) ) , I =1, IR P ) 

C 

C RECONSTRUCT ARRAY OF POLES 
C 

113 INO =0 

DO 120 1=1, ICP 
IND=I NO+1 



P( INO) = CPP( I ) 

I NO =1 ND+1 

P( I NO ) =CONJG ( CPP ( I ) ) 

120 CONTINUE 

IF (IRP.EQ.OJ GO TO 135 
00 130 1=1, IRP 
IN C= IN 0+1 
P( INO)=CRP(I ) 

130 CONTINUE 

135 IF(IWR.EQ.l) WRIT E ( 6 , 140 ) INO 
140 e ORMAT( 10X,' RECCN PGLES',14) 

IF (IXPLT.NE. 5) RETURN 
C 

ITEXT = 3 

00 150 1=1, IP 

XX = 2.5 * RE AL ( P ( I ) ) 

YY = 2.5 * AI^AG(P ( I) ) 

CALL SYMSQL(XX,YY,HEIG» ITEXT, ANG,NC l 
150 CONTINUE 
IPEN = -3 
XX = 5.0 
YY = -10. 0 

CALL PLOT (XX ,YY, I?EN ) 

RETURN 

END 
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SUBROUTINE NEWCF ( I P, P, A, IWR) 

C 

C DETERMINES THE COEFFICIENTS OF THE 
C PREDICTOR POLYNOMIAL FROM THE ROOT OF 
C ThE CHARACTERISTIC EQUATION 
C 

C IP = ORDER OF THE POLYNOMIAL 
C 

C P = COMPLEX ROOTS OF CHARACTERISTIC ECN 
C 1.5. POLES OF THE FILTER 

C 

C A = ARRAY OF REAL COEFFICIENTS 
C 

C IWR = 1 FOR PRINTING OURING SUBROUTINE 
C 

C IF ALL COMPLEX ROOTS ARE IN CONJUGATE FAIRS 
C ALL OF THE COEFFICIENTS SHOULD BE REAL 
C THIS CAN BE CHECKED WITH OUTPUT 
C 

C0MFLEX*16 PP (14) , AA(1A) 

C0M°LEX*8 P < I P ) 

REAL*4 A( IP I 
Z =0.0 

00 10 I = 1,IP 

AA( I) = P (I ) 

pp<i) = pm 

10 CONTINUE 
X = IP 
M = IP-1 
DO 40 L = 1,M 
00 30 I = 2.K 
AA( I) = AA ( I )-*- AA ( I— 1 ) 

30 CONTINUE 

* = Qo"~20 I - l.K 
J = I+L 

A A ( I ) = PP( J )*AA ( I ) 

20 CONTINUE 

40 CONTINUE 

K = IP/2 
K = 2*K/( IP-K ) 

DO 50 I = K , I F» 2 
AA ( I ) = -AA (I ) 

50 CONTINUE 

DO 60 I ■ 1 ,1 F 

J = IP+l-I 
A I J ) =RE AL ( AA (I ) ) 

PP ( J) = AA ( I ) 

60 CONTINUE 

IF(IWR.NE.l) RETURN 

WRI TE ( 6 , 7 0) ((I,PP(I)),I = 1,IP) 

70 FORMAT!/' RECONSTRUCTED POLYNOMIAL COEFFICIENTS'/ 

* 20X, • IMAGINARY TERMS SHOULD BE ZERO'/ 

* ( IX, 15, F13. 3,318.4) ) 

RETURN 

END 
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SUBROUTINE RECON( A, I P ,R MS , I VF ,1 PP , N , S ) 

C 

C RECONSTRUCTS SPEECH SAMPLES FROM LPC COcFF, ETC 
C 

C A = VECTOR OP LPC CGEFF 
C IP * NUMBER OF COEF = (ORDER OF FILTER) 

C RMS = RMS VALUE OF ERROR SIGNAL 
C I VF =0 UNVOICED 
C =1 VOICED 

C IPP = PITCH PERIOD IN NUMBER OF SAMPLES 
C N = SAMPLES PEP FRAME 
C S = SAMPLE VECTOR (OUTPUT) 

C 

DIMENSION A(1 ) » S ( 1 ) ,X(270) ,XX ( 14 ) , AC (14 ) 

CAT A XX,RMSO, I SEED , I VF 0/15*0. 0,122 4,0/ 

00 10 I = 1,1 F 

X( I ) = XX ( I ) 

10 CONTINUE 

NIP a N+I P 
NS = 1+1? 

C 

C IF CURRENT PULSE UNFINISHED DON’T CHANGE COSFF YET 
C 

I F ( IVFO.NE.O ) GO TO 400 
C 

C UPDATE COEFF 
C 

100 RM SO = $QRT(RMSO*RMS) 

IF(IVF.EQ.O) RMSO = R M$ 

IF ( RMSO.LT. ( RMS/2 .0) ) RMS 0= RMS/2 .0 
DO 105 I = 1,IP 
AO ( I ) = A (I ) 

105 CONTINUE 

IVFC = I VF 
IPPO = IPP 
r 

C TEST IF VOICED 
C 

IF( IVFO.NE.O) GO TO 300 
C 

C RECCNSTRUCT UNVOICED SPEECH 
C 

200 E = RMSQ*GGNOF(ISEED) 

CO 210 I = 1 , IP 
NS M I = NS** I 
E = E-A< I )*X ( NSM I ) 

210 CONTINUE 
X ( N S ) = E 

IF (NS .Gc.NIP) GO TO 600 
NS = NS + 1 
GO TO 200 
C 

C START VOICED PULSE 
C 

300 NP = 1 

EX =RMSO* SORT (FLOAT ( IPPO) ) 

C 

C TEST FOR BEG INI NG OF PULSE PERIOD 
C 

400 IF (NP.GT. I PPO) GO TC 100 
E = C.O 

IF (NP.EO. 1) E = -EX 
C 

C RECONSTRUCT VOICED SPEECH 
C 

500 CO 510 I = 1 ,IP 
NSM I = NS— I 
E = E— A ( I )*X(NSMI ) 

510 CONTINUE 
NP = NP+1 
, X( NS) = E 

IF< NS.GE.NIP ) GO TO 600 



90 



NS = NS +1 
GO TO 400 
C 

C SAVE VALUES AND FREPARE OUTPUT 
C 

600 00 610 I = 1, IP 

XX ( I ) = X< N+I ) 

610 CONTINUE 

DO 620 I = 1 t N 
S( I ) = X( I+IP) 

620 CONTINUE 
RETURN 
' END 



SUBROUTINE RMS ( X » N » V A L ) 

C 

C DETERMINE THE RMS VALUE OF A S ET OF CAT A 
C 

C X = VECTOR OF -IN PUT SAMPLES 
C N = NUMBER OF SAMPLES 
C VAL = RMS VALUE RETURNED 
C 

DIMENSION X ( 1 ) 

VAL = 0.0 
DO 10 I = 1, N 
VAL = VAL+X( I )**2 
10 CONTINUE 

VAL = SQRT ( VAL/FLQAT (NT) 

RETURN 

ENC 
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SUBROUTINE W INOW ( X , Y, N, IW IN ) 

C 

C X = VECTOR OF UNWINDQWED SAMPLES 
C Y = VECTOR OF WINDOWED SAMPLES (OUTPUT) 

C N = NUMBER OF SAMPLES 
C IW IN = TYPE OF WINDOW 
C 0 = RECTANGULAR (COPY ONLY) 

Cl* HAMMING (ALPHA = 0.54) 

C 2 = BARTLETT 
C 3 = 3LACKMAN 
C 4 = HANNING 
C 

DIMENSION X(1),Y(1) 

DATA P I t T WOP I , FOR PI /3. 1415926,6.2331353,12 .5 66371 / 
IF< IWIN.LT.O.CR.IWIN.GT .4) GO TO 999 
AN = FLOAT(N) 

GO TO (110,210,310,410) »IWIN 
C 

C RECTANGULAR WINDOW COPY VECTOR 
C 

10 CO 20 1=1 t N 

Y( I ) = X( I ) 

20 CONTINUE 

RETURN 
C 

C HAMMING WINDOW 
C 

110 00 120 1 = 1 , N 

A J = FLOAT (I -1 ) 

Y(I) = X(I)*( C.54-0.A6*C0S(TW0PI*AJ/(AN-1.0) ) ) 

120 CONTINUE 
RETURN 
C 

C BARTLETT WINDOW 
C 

210 NN = N/2 

NNN = NN+1 
00 220 1=1, NN 
AJ = FLOAT (-1-1) 

Y ( I ) = X(I )*2.0*AJ/ (AM-1.0) 

220 CONTINUE 

DO 230 I = NNN, N 
AJ = FLOATM-1) 

Y( I ) = X{ I )*2.0*( 1. 0-AJ/(AN-1.0) ) 

230 CONTINUE 
RETURN 
C 

C BLACKMAN WINDOW 
C 

310 CO 320 1 = 1 ,N 

AJ = FLOAT(I.-i) 

Y ( I ) = X( I)*( 0.42-0. 5*C0S(TW0PI*AJ/(AN-1. 0) ) 

* +0 . 08 *CQS ( FORPI*AJ/ (AN-1 .C) ) ) 

320 CONTINUE 
RETURN 
C 

C HANNING WINDOW 
C 

410 CO 420 1=1 . N 

AJ = FLOAT(I-i) 

Y ( I ) = X( I)*0 .5*(1 .O-COS(TWOPI*AJ/(AN-1.0> ) ) 

420 CONTINUE 
RETURN 

999 WRI TE (6 ,998 1 

998 FORMAT! //10X, '** ERROR SUBR WINDOW **’//) 

STOP 

END 
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no ooooooooooooooo 



SUBROUTINE VPLTIN (N) 



SUBROUTINE CREATES A VERSAPLOT GRAPH CF 60 FRAMES 
OF VOICE SAMPLES (128 SAMPLES / FRAME) 

CALL VPLTIN TO INITIALIZE EACH PLOT 

CALL VPLT FOR £ACH FRAME 



N = NU MBEP CF 
X= VECTOR OF 



samples per frame 

SAMPLES 



CALLING PROGRAM SHOULD 
CALL PLOT (X, Y, 999) 
TO COMPLETE PLOTTING 



ISSUE 



DIMENSION X(768), Y( 256) ,X0(8) ,Y0(8) 

DATA XO/O.OtO .0,7. 0,0. <3,7.0, 0.0, 7. 0,0.0/ 

DATA Y0/10. ,-10., 10. ,-iC. ,10. ,-10. , 10. ,-10 ./ 
DO 10 1=1,768 
X(I ) =F LOAT (T ) /128 . 0 
0 CONTINUE 

CALL PLOTS ( IA, IB, IC ) 

NPEfs=2 

CALL NEW PEN (N PEN ) 

NP LT = 1 
I PEN = -3 

CALL PLOT (XO(NPLT) ,Y0( NPLT ) , I P EN ) 

I P E N =2 
IX=768 
IY= 11 
RETURN 



ENTRY VPLT(Y) 

DO ICO 1=1, N 
IX= IX +1 

IF( IX.LE.768 ) GO TO 50 
IX = 1 
IY = I Y— 1 

YS = 2 . 0+0. 7*F LCAT ( I Y ) 

IF ( IY .GE .1 ) GO TO 4C 
NPLT = NPLT +1 
I PEN=-3 

CALL PLOT (X0(NPL7),Y0(NPLT) ,IPEN) 
I?EN=2 
I X= 1 
IY= 10 

YS=2.0+0.7*FLCAT( IY) 

40 IP EN =3 

YY = Y( I J/100 .0 +YS 

CALL PLOT(XdX) ,YY, IPEN) 

IP EN=2 
GO TC 100 

50 YY=Y(I ) /I 00. O+YS 

CALL PLOT (X (I X), YY, IPEN ) 

100 CONTINUE 
RETURN 
ENC 
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APPENDIX A. 3 



POWER SPECTRAL DENSITY ANALYSIS AND 
PLOTTING FROGRAM 



DIMENSION X (2 56 ) 

READ( 5, 8, END = SO) INUM,ISKIP,I WIN 

8 FORM AT (315) 

IF ( ISKIP.EQ-0 ) GO TO 10 
DO 9 1=1, I SKI P 
REAl{2»25 ,END=90) X 

9 CONTINUE 

10 M = 8 

REA0(2t25 » END =90 ) X 
CALL PSDINT(X,M) 

CALL SPLINT 
K = 0 

20 READ! 2 » 25 > END =90) X 

25 FORMAT (128A4) 

CALL PSD { X.MtIWIN) 

IF ( K .L E.6 ) WR ITE < 6 » 30) l X( J ) t J = 1 1 12 6 ) 
30 FORMAT (IX » 1 OF 12 « 5 ) 

K=K+ 1 

CALL SPL (X) 

IF(K.GT.INUM) GO TO 90 
GO TO 20 
90 IP EN = 999 

CALL PLOT (AX, Y, IP EN 5 

STOP 

END 
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SUBROUTINE SPLINT 
C 

C SUBROUTINE PLOTS THE POWER SPECTRAL DENSITY 
C (LOG OP MAGNITUDE) FOR 128 FREQUENCI ES WHICH 
C IS INPUT IN MAGNITUDE FORM IN VECTOR Y 
C 

C VALUES IN Y SHOULD 8E BETWEEN O.Ol AND 100.0 
C 

C CALL SFLINT TO INITIALIZE PLOTTING 
C 

C CALL SFL (Y) FOR EVERY SET OF 123 PSD VALUES 
C 

C CALLING PROGRAM SHOULD ISSUE CALL PLOT (X,Y,999I 
C WHEN PLOTTING TS COMPLETE 
C 
C 

DIMENSION Y til ),X(128)»YY(128 ) 

DIMENSION ROR GX ( 6 ) ,R0RGY(6) ,GX(19),GY(i9),IGP<l9) 
C 

C DATA FCR SIX PLOT ORIGINS 
C 

CAT A RORGX/O .1,-1 .2,-1 .2* 3. 3,-1. 2,- 1.2/ 

DATA RORGY/O. 5, 4. 0,4.0 ,-17.0,4.0,4.0/ 

C 

C OATA TC PLOT AXIS 
C 

DATA GX/ 7. 5, 7. 5, 6. C, 6. 0,4. 5. 4. 5, 3. 0,3. 0,1. 5, 

* 1.5, 0.0, 0.0, -0.1 ,0.0, -0.1, 0.0, -0.1, 0.0, -0.1/ 
DATA GY/0.0, — C.l , 0. 0,— 0.1 ,0.0 ,— 0. 1 ,0.0 ,—0.1 , 

0.0,— 0 .1,-0 .1,0 .800, 0.300, 0.600,0.o00,0. A, C.4, 

* 0 . 200 , 0 . 200 / 

DATA I GP/2, 2, 2, 2, 3, 2, 3, 2, 3, 2, 3, 2, 2, 2, 2, 3, 2, 3, 2/ 

DO 10 1=1 ,128 

X( I ) = Fj_OA T ( 1-1 ) *0. 05 85 9-0.04 
10 CONTINUE 

CALL PLOTS (I A, I 3, I C ) 

IFLAG=0 
IP LT N = 1 
I 5 C A N =0 
15 IPEN=-3 

CALL PLOT (ROR GX ( IP LTN ) » RORGY ( IPLTN ) ,IPEN) 

NPEN=4 

CALL NEWPEN(NPEN) 

DO 30 1=1,19 

CALL PLOT (GX ( I ),GY( I) ,IGP(D) 

30 CONTINUE 

NPE N=2 

CALL NEWPEN(NPEN) 

RETURN 

C 

c 

ENTRY SPL (Y) 

I SC AN = I SCAN + 1 
C 

C RETURN IMMEDIATELY IF PLOT FULL 
C 

IF< IFLAG.EO.il RETURN 
C 

C CONVERT DATA TO LOG PLOT 
C 

DO 50 1=1,128 
YTEM=Y( I ) 

IF (YTEM.LT .0 .100 ) YTEM=C.100 
YY ( I ) =0.1 0+0.2000* A L0G10 (YT EM ) 

50 CONTINUE 
IPEN=-3 
XSCANO=0. 04 
YSCAN0=0 . 1 

CALL PLOT ( XS CANO, YSCANO, IP BN ) 

IPSN=3 

CALL PLOT (X (1 ), YY( 1 ), IPEN ) 

I PE N = 2 
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CO 60 1 = 2,128 

CALL PLOT (X(I»,YY( I), IFEN) 
60 CONTINUE 

IF (ISCAN.LE. 25) RETURN 

ISCAN=0 

IP L TN = I P L TN+ 1 

IF ( IPLTN.LE.6 ) GO TO 15 

IFLAG=1 

RETURN 

ENC 
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SUBROUTINE PS Cl NT ( X , M ) 

C 

C POWER SPECTRAL DENSITY BASED ON ALGORITHM 

C PRESENTED BY C M RACER IN, 'AN IMPRCVED 

C ALGORITHM FOR HIGH SPEED AUTOCORRELATION 

C WITH APPLICATION TO SPECTRAL ESTIMATION,' 

C IEEE TRANS AUDI 0,E LECTRCACOUSTI CS , V AU-18,DEC70 

C 

C X = VECTOR OF INPUT SAMPLES 

C M = POWER OF 2 FOR NUMBER OF SAMPLES 

C IWIN = 0 NO WINDOW 

C 1 HAMMING (ALPHA = 0.54) 

C 2 BARTLETT 

C 3 BLACKMAN 

C 4 HANNING 

C 

C FIRST CALL IS TO PSDINT AND THEN EACH SUCESSIVE 

C CALL FOR THAT STRING OF DATA SHGULC BE TO PSD 

C TO START A FRESH STRING OF DATA CALL PSDINT AGAIN 

C 
C 

DIMENSION X( 256 ), I WK ( 11 ) 

COMPLEX XN(512) ,XNP(512 ),YN(512 ), A I ( 512 ) 

DATA XN,XNP/1C24*{ C.O,C.O) / 

MM = M+l 
M2**M 
NN=2*N 
C 

C SPECIFY COEFFICIENTS NEEDED IN ADDITION 

C OF NEXT X ( F ) VECTOR TO CURRENT X(F) VECTOR 

C TO * AKE Y ( F ) VECTOR. IN BINARY REVERSE ORDER. 

C ' 

NNN = NN-1 
DO 90 1=1, NNN, 2 
AKI) = (1.0, C.O) 

II = 1 + 1 

AI ( II ) = (-1 .0,0.0 ) 

SO CONTINUE 

CALL FFRDR2 <AI,MM,IWK) 

AI MG =0.0 

DO 101 I = 1 , N 

XN ( I ) = CMPLX (X (I ) , A I M G ) 

101 CONTINUE 
C 

C FFT OF CURRENT X(T) VECTOR, LAST HALF ZERO. 

C 

CALL FFT2 (XN,MM,IWK) 

RETURN 

C 

C USE THIS ENTRY FOR EACH FRAME AFTER FIRST 

C 

ENTRY PSD (X,M , IWIN) 

MM = M+l 
N = 2**M 
NN = 2*N 
AN = FLOAT (,N) 

ANN = FLOAT ( NN ) 

A I M G = 0.0 

DO 110 I = 1 ,N 

XNP(I) = CMP LX( X( I ) , A I MG ) 

110 CONTINUE 
C 

C FFT OF NEXT X(T) VECTOR, LAST HALF ZERO. 

C 

CALL FFT 2 (XNP,MM,IWK) 

C 

C FORM Y { F) VECTOR, CO EFF IN REV BINARY ORDER. 

C 

CO 120 I = l.NN 

YN ( I ) = ( XN ( I ) + AI ( I )*X NP( I ) ) *CONJ G ( XNP ( I ) ) 

120 CONTINUE 

CO 123 I = 1,NN 
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c 

C FORM CONJG TO PREFORM INV OFT 

C 

YN( I) = CONJG (YN(I )) 

123 CONTINUE 
C 

C INV FFT OF Y(F) GIVES RXX(TAU) 

C 

CALL FFT2RV ( YN,MM,IWK) 

00 143 I = 1,NN 
YN( I) = CCNJG < YN ( I ) ) / ANN 
143 CONTINUE 

CALL W IND2 ( YN, N» IWIN ) 

CALL FFT2 (YN,M,IWK) 

CALL FFRDR2 (YN,M,IWK) 

CO 153 I » 1, N 

X(I ) = CABS (YN( I) ) /(AN**2) 

153 CONTINUE 
C 

C MOVE NEXT X(F) INTO CURRECT X{F) 

C 

DO 160 I = I* NN 
XN ( I ) = X NP { I ) 

XNP (I ) = (0 .0,0 .0 ) 

160 CONTINUE 
RETURN 
ENC 



SUBROUTINE WIND2 (B,N,IWIN) 
COMPLEX B< 512 ) 

DATA PI, T WO PI/3. 1415926,6. 233185/ 
AN = FLGAT(N) 



50 


GO TO (200 ,300,400,100) , I 
RETURN 




100 


00 190 I = 1 ,N 
AJ = FLOAT(I-l) 

F = 0.5*(1.0-C0S ( ( TWOP I*AJ )/ (AN-1. 
B( I) = B< I )*F 


0) ) ) 


190 


CONTINUE 

RETURN 




200 


DO 290 I = 1 ,N , 

AJ = FLOAT ( I- 1) 

F = 0.54-0. 46*C0S ( (TWO P I* AJ i / ( AN-1 


.0) ) 




3(1) = B( I ) *F 




290 


CONT INUE 
RETURN 




200 


DO 390 I = 1 , N 





AJ = FLCAT (4-1 ) 

IF ( I ,LE. ( N/2) ) F = 2.0*AJ/(AN-1.0) 

IF( I .GT .(N/2 ) ) F = 2. 0-2.0*AJ/< AN-1 .0 ) 
e( I ) = B( I )*F 
290 CONTINUE 
RETURN 

400 DO 490 I = 1 ,N 
AJ = FLOAT( 1-1) 

F = 0 .42“Q.5*C0S ( T WOP I * AJ/ ( AN-1 . C ) ) +0 .08 * 
* CGS(4.0*PI*AJ/( AN-1. 0) ) 

3( I 1 = B ( I ) *F 
490 CONTINUE 
RETURN 
END 
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APPENDIX A. 4 



NiNE TRACK TO SEVEN TRACK TA °E CONVERSION 
PROGRAM 



DIMENSION DATU024 ) , IDATQ024) 

FACTOR=(2 .0**23 )/ 250.0 
HTEST =2**25— 1 

ltest=-htest 

NF I LES = 6 
REWIND 2 
REW IND 4 
N=! C24 
C 

CO 200 1=1 »NF ILES 
WRITE<6,11) I 
11 FORMAT! ' 1FILE • ,14) 

C 

DO 100 J=1 ,50 

READ( 2, 15, END=19C,ERR=30) DAT 

15 FORMAT I128A4) 

GO TO 50 

30 WRITE(6,21) 

21 FORMAT (60X ,' READ ERROR') 

50 WRITE ( 6, 16 ) J 

16 FORMAT <10X , 'RECORD HAS BEEN READ*, 14) 

IF(J.EQ.l) WRIT E (6 , 17 ) DAT 

17 FORMAT { IX, 10F12.3) 

C 

DO 80 K=1 , 1 024 

ID AT (K)=I FIX! DAT! K ) *FAC TOR) 

IF{ IDAT (K) .GT. HTEST) WRITE(6,18) I , J , K 

18 FORMAT! 40X, ' TCC LARGE FILE* ,14,’ RECORD', 14, 

* • ITEM*, 14) 

IF( IDAT(K).LT.LTEST) WRITE(6,19) I,J,K 

19 FORMAT! 40X, 'TOO SMALL FIlE',14,' RECORD', 14, 

* ' •‘IT EM' , 14 ) 

80 CONTINUE 

C 

IF(J.EG.l) WRITE!6,20) IDAT 

20 C CRMAT(LX, 10112) 

CALL MORE! IDAT ,N ) 

WRITE ( 4,25 ) I D A T 

25 FORMAT (128! 8A4) ) 

100 CONTINUE 

C 

WR ITE! 6, 26 ) 

26 FORMAT ( 5X ,' ALL 50 RECORCS READ') 

155 READ(2,15,EN0=190) DAT 

GO TO 155 
190 WRI TE( 6 ,2 7) 

27 FORMAT! 2X, ' END OF FILE') 

ENCFILE 4 

20C CONTINUE 
C 

STOP 

END 
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APPENDIX B.l COMPUTER ANALYSIS AMD MODIFICATION OF VOICED 

SPEECH 

The 15 frame ( 334 msec. ) segment of speech analyzed in 
this appendix is the "long e" sound (as in need) and is 
spoken by a woman. The process illustrated shows both 
direct reconstruction and r econs truct i on with the pitch 
reduced by a factor of 0.58 and the formant frequencies 
reduced by a factor of 0.88. 



— — -r- 

Figure B.1.1 WAVEFORM OF INPUT SPEECH 
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Figure B.1.2 LOGARITHMIC POWER SPECTRAL DENSITY OF INPUT SPEECH 
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Figure B. 1.3(a) Processing Summary of Frame 
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Figure B. 1.3(b) Processing Summary of Frame 
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Figure B. 1.3(c) Processing Summary of Frame 
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Figure B. 1.3(d) Processing Summary of Frame 
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B'igure B. 1.3(e) Processing Summary of B^raine 
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Figure B.1.4 WAVEFORM OF ERROR SIGNAL 



Figure B.1.5 WAVEFORM OF FILTERED ERROR SIGNAL 
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Figure B. 1.6(a) VOCAL TRAQT POLE LOCATIONS 
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Figure B. 1.6(b) VOCAL TRACT POLE LOCATIONS 








Figure B.1.8 WAVEFORM OF MODIFIED OUTPUT SPEECH 
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Figure B.1.9 LOGARITHMIC POWER SPECTRAL DENSITY OF UNMODIFIED OUTPUT SPEECH 
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Figure B.1.10 LOGARITHMIC POWER SPECTRAL DENSITY OF MODIFIED OUTPUT SPEECH 



APPENDIX B . 2 COMPUTER ANALYSIS AND MODIFICATION OF 

UNVOI CED SPEECH 

The 15 frame ( 384 msec. ) segment of speech analyzed in 
this appendix is the "sa" sound (begining of salt) and is 
spoken by a woman. The process illustrated shows both 
direct recons tr uct i on and r econs tr uct i on with the pitch 
reduced by a factor of 0.58 and the formant frequencies 
reduced by a factor of 0.88. 
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Figure B.2.2 LOGARITHMIC POWER SPECTRAL DENSITY OF INPUT SPEECH 
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Figure B. 2. 3(a) Processing Summary of Frame 
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Figure B. 2. 3(b) Processing Summary of Frame 
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Figure B. 2. 3(c) Processing Summary of Frame 
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Figure 2.3(d) Processing Summary of Frame 
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Figure B. 2. 3(e) Processing Summary of Frame 
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Figure B.2.4 WAVEFORM OF ERROR SIGNAL 



Figure B.2.5 WAVEFORM OF FILTERED ERROR SIGNAL 
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Figure B. 2. 6(a) VOCAL TRACT POLE LOCATIONS 
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Figure 6.2.6(b) VOCAL TRACT POLE LOCATIONS 
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Figure B.2.9 LOGARITHMIC POWER SPECTRAL DENSITY OF UNMODIFIED OUTPUT SPEECH 
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Figure B.2.10 LOGARITHMIC POWER SPECTRAL DENSITY OF MODIFIED OUTPUT SPEECH 



APPENDIX C DESCRIPTION OF VOICE TAPE 



The audio recording which is available from the author has 
four sections each of which contains three segments of 
speech. These three speech segments are of the following 
sounds : 



Segment 1 - Five long vowels. 

"a e i o u" 

Segment 2 - Four words which are combinations of 
fricatives and voiced sounds. 

"sat free h i p done" 

Segment 3 - A sentence with a varity of sounds. 
"Every salt breeze comes from the sea." 

Each of these segments Is repeated in each segment of the 
tape. Each section of the tape shows the effects of a 
different step in the processing. 



Section 1 - Unprocessed speech, the recording used 
for input to the processing system. 

Section 2 - Speech which has been converted to 
digital form and then converted back to analog 
form with no other processing. 

Section 3 - Speech which has been encoded into a 
set of LPC parameters and then decoded using the 
same parameters (i.e. no modification). 

Section 4 - Speech which has been encoded into a 
set of LPC parameters and those parameters altered 
to reduce the pitch frequency by a factor of 0.56 
and to reduce the formant frequencies by a factor 
of 0.88. The same LPC decoding process is then 
used to reconstruct the speech segment. 
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